diff --git a/notebooks/11_temporal_probability_models/assets/hmm.jpg b/notebooks/11_temporal_probability_models/assets/hmm.jpg new file mode 100644 index 00000000..333e7226 Binary files /dev/null and b/notebooks/11_temporal_probability_models/assets/hmm.jpg differ diff --git a/notebooks/11_temporal_probability_models/assets/particle-filter-example.jpg b/notebooks/11_temporal_probability_models/assets/particle-filter-example.jpg new file mode 100644 index 00000000..afbe77f1 Binary files /dev/null and b/notebooks/11_temporal_probability_models/assets/particle-filter-example.jpg differ diff --git a/notebooks/11_temporal_probability_models/assets/robot_localization_intro.jpg b/notebooks/11_temporal_probability_models/assets/robot_localization_intro.jpg new file mode 100644 index 00000000..dbba83b7 Binary files /dev/null and b/notebooks/11_temporal_probability_models/assets/robot_localization_intro.jpg differ diff --git a/notebooks/11_temporal_probability_models/assets/umb-ex.jpg b/notebooks/11_temporal_probability_models/assets/umb-ex.jpg new file mode 100644 index 00000000..83ee92d9 Binary files /dev/null and b/notebooks/11_temporal_probability_models/assets/umb-ex.jpg differ diff --git a/notebooks/11_temporal_probability_models/index.html b/notebooks/11_temporal_probability_models/index.html new file mode 100644 index 00000000..eefb5434 --- /dev/null +++ b/notebooks/11_temporal_probability_models/index.html @@ -0,0 +1,550 @@ + + + +
+ + +Hidden Markov Models can be applied to part of speech tagging. Part of speech tagging is a fully-supervised learning task, because we have a corpus of words labeled with the correct part-of-speech tag. But many applications don’t have labeled data. So in this note, we introduce some of the algorithms for HMMs, including the key unsupervised learning algorithm for HMM, the Forward-Backward algorithm.
+Filtering is the task of computing the belief state which is the posterior distribution over the most recent state, given all evidence to date. Filtering is also called state estimation. We wish to compute .
+ +![]() |
+
|---|
| Bayesian network structure and conditional distributions describing the umbrella world. | +
In the umbrella example, this would mean computing the probability of rain today, given all the observations of the umbrella carrier made so far. Filtering is what a rational agent does to keep track of the current state so that rational decisions can be made. It turns out that an almost identical calculation provides the likelihood of the evidence sequence, .
+A useful filtering algorithm needs to maintain a current state estimate and update it, rather than going back over the entire history of percepts for each update. (Otherwise, the cost of each update increases as time goes by.) In other words, given the result of filtering up to time t, the agent needs to compute the result for from the new evidence ,
+
+for some function . This process is called recursive estimation. We can view the calculation as being composed of two parts: first, the current state distribution is projected forward from to ; then it is updated using the new evidence . This two-part process emerges quite simply when the formula is rearranged:
+
+Here is a normalizing constant used to make probabilities sum up to 1. The second term, represents a one-step prediction of the next state, and the first term updates this with the new evidence; notice that is obtainable directly from the sensor model.
+Now we obtain the one-step prediction for the next state by conditioning on the current state :
+
+Within the summation, the first factor comes from the transition model and the second comes from the current state distribution. Hence, we have the desired recursive formulation. We can think of the filtered estimate as a “message” that is propagated forward along the sequence, modified by each transition and updated by each new observation. The process is given by
+
+where FORWARD implements the update described in previous equation and the process begins with . When all the state variables are discrete, the time for each update is constant (i.e., independent of t), and the space required is also constant.
Let us illustrate the filtering process for two steps in the basic umbrella example. That is, we will compute as follows:
+This is the task of computing the posterior distribution over the future state, given all evidence to date. That is, we wish to compute for some . In the umbrella example, this might mean computing the probability of rain three days from now, given all the observations to date. Prediction is useful for evaluating possible courses of action based on their expected outcomes.
+The task of prediction can be seen simply as filtering without the addition of new evidence. In fact, the filtering process already incorporates a one-step prediction, and it is easy to derive the following recursive computation for predicting the state at from a prediction for :
+
+Naturally, this computation involves only the transition model and not the sensor model. It is interesting to consider what happens as we try to predict further and further into the future. It can be shown that the predicted distribution for rain converges to a fixed point , after which it remains constant for all time. This is the stationary distribution of the Markov process defined by the transition model.
This is the task of computing the posterior distribution over a past state, given all evidence up to the present. That is, we wish to compute for some such that . In the umbrella example, it might mean computing the probability that it rained last Wednesday, given all the observations of the umbrella carrier made up to today. Smoothing provides a better estimate of the state than was available at the time, because it incorporates more evidence.
+In anticipation of another recursive message-passing approach, we can split the computation into two parts—the evidence up to and the evidence from to ,
+
+where “” represents pointwise multiplication of vectors. Here we have defined a “backward” message , analogous to the forward message . The forward message can be computed by filtering forward from 1 to . It turns out that the backward message can be computed by a recursive process that runs backward from :
+
+where the last step follows by the conditional independence of and , given . Of the three factors in this summation, the first and third are obtained directly from the model, and the second is the “recursive call.” Using the message notation, we have
+
+where BACKWARD implements the update described in previous equation. As with the forward recursion, the time and space needed for each update are constant and thus independent of .
Let us now apply this algorithm to the umbrella example, computing the smoothed estimate for the probability of rain at time , given the umbrella observations on days 1 and 2. This is given by
+
+The first term we already know to be , from the forward filtering process described earlier. The second term can be computed by applying the backward recursion:
+
+Using previous equation we find that the smoothed estimate for rain on day 1 is
+
+Thus, the smoothed estimate for rain on day 1 is higher than the filtered estimate (0.818) in this case. This is because the umbrella on day 2 makes it more likely to have rained on day 2; in turn, because rain tends to persist, that makes it more likely to have rained on day 1.
Given a sequence of observations, we might wish to find the sequence of states that is most likely to have generated those observations.
+A Markov chain is useful when we need to compute a probability for a sequence of observable events. In many cases, however, the events we are interested in are hidden: we don’t observe them directly.
+A hidden Markov model (HMM) allows us to talk about both observed events Hidden Markov model (like words that we see in the input) and hidden events (like part-of-speech tags) that we think of as causal factors in our probabilistic model.
![]() |
+
|---|
| A hidden Markov model for relating numbers of ice creams eaten (the observations) to the weather (H or C, the hidden variables). | +
Hidden Markov models should be characterized by three fundamental problems:
+The first problem is to compute the likelihood of a particular observation sequence. For example, given the ice-cream eating HMM, what is the probability of the sequence 3 1 3? More formally:
+Computing Likelihood: Given an HMM and an observation sequence , determine the likelihood .
Let’s start with a slightly simpler situation. Suppose we already knew the weather and wanted to predict how much ice cream Jason would eat. This is a useful part of many HMM tasks. For a given hidden state sequence (e.g., hot hot cold), we can easily compute the output likelihood of 3 1 3.
+Let’s see how. First, recall that for hidden Markov models, each hidden state produces only a single observation. Thus, the sequence of hidden states and the sequence of observations have the same length.
+Given this one-to-one mapping and the Markov assumptions that the probability of a particular state depends only on the previous state, for a particular hidden state sequence and an observation sequence , the likelihood of the observation sequence is:
+
+The computation of the joint probability of our ice-cream observation 3 1 3 and one possible hidden state sequence hot hot cold is as follows:
+
+Now that we know how to compute the joint probability of the observations with a particular hidden state sequence, we can compute the total probability of the observations just by summing over all possible hidden state sequences:
+
+For our particular case, we would sum over the eight 3-event sequences cold cold cold, cold cold hot, that is,
+
+For an HMM with hidden states and an observation sequence of observations, there are possible hidden sequences. For real tasks, where and are both large, is a very large number, so we cannot compute the total observation likelihood by computing a separate observation likelihood for each hidden state sequence and then summing them.
+Instead of using such an extremely exponential algorithm, we use an efficient algorithm called the forward algorithm. The forward algorithm is a kind of dynamic programming algorithm, that is, an algorithm that uses a table to store intermediate values as it builds up the probability of the observation sequence. The forward algorithm computes the observation probability by summing over the probabilities of all possible hidden state paths that could generate the observation sequence, but it does so efficiently by implicitly folding each of these paths into a single forward trellis.
Each cell of the forward algorithm trellis represents the probability of being in state after seeing the first t observations, given the automaton . The value of each cell is computed by summing over the probabilities of every path that could lead us to this cell. Formally, each cell expresses the following probability:
+
+Here, means the state in the sequence of states is state . We compute this probability by summing over the extensions of all the paths that lead to the current cell. For a given state at time , the value is computed as
+
+The three factors that are multiplied in this equation in extending the previous paths to compute the forward probability at time t are:
Algorithm is done in three steps:
+The pseudocode of the forward algorithm:
+function FORWARD(observations of len T, state-graph of len N) returns forward-prob
+ create a probability matrix forward[N,T]
+ for each state s from 1 to N do ; initialization step
+ forward[s,1]←pi(s) ∗ b_s(o_1)
+ for each time step t from 2 to T do ; recursion step
+ for each state s from 1 to N do
+ forward[s,t] = sum(forward[j ,t-1] ∗ a_{j,s} ∗ b_s(o_t) for j=1 to N)
+ forwardprob = sum(forward[s,T] for s=1 to N) ; termination step
+ return forwardprob
+
+For any model, such as an HMM, that contains hidden variables, the task of determining which sequence of variables is the underlying source of some sequence of observations is called the decoding task. In the ice-cream domain, given a sequence of ice-cream observations 3 1 3 and an HMM, the task of the decoder is to find the best hidden weather sequence (H H H). More formally,
+Decoding: Given as input an HMM and a sequence of observations , find the most probable sequence of states .
The most common decoding algorithms for HMMs is the Viterbi algorithm. Like the forward algorithm, Viterbi is a kind of dynamic programming Viterbi algorithm that makes uses of a dynamic programming trellis.
+The idea is to process the observation sequence left to right, filling out the trellis. Each cell of the trellis, , represents the probability that the HMM is in state after seeing the first observations and passing through the most probable state sequence , given the automaton . The value of each cell is computed by recursively taking the most probable path that could lead us to this cell. Formally, each cell expresses the probability
+
Note that we represent the most probable path by taking the maximum over all possible previous state sequences. Like other dynamic programming algorithms, Viterbi fills each cell recursively. Given that we had already computed the probability of being in every state at time , we compute the Viterbi probability by taking the most probable of the extensions of the paths that lead to the current cell. For a given state at time , the value is computed as
+
+The three factors that are multiplied in this equation for extending the previous paths to compute the Viterbi probability at time t are:
The pseudocode of the viterbi algorithm:
+function VITERBI(observations of len T,state-graph of len N) returns best-path, path-prob
+ create a path probability matrix viterbi[N,T]
+ for each state s from 1 to N do
+ viterbi[s,1] = pi(s) * b_s(o_1)
+ backpointer[s,1] = 0
+ for each time step t from 2 to T do
+ for each state s from 1 to N do
+ viterbi[s,t] = max(viterbi[j,t-1] * a_{j,s} * b_s(o_t)) for j=1 to N
+ backpointer[s,t] = argmax(viterbi[j,t-1] * a_{j,s} * b_s(o_t) for j=1 to N)
+ bestpathprob = max(viterbi[s,T] for s=1 to N)
+ bestpathpointer = argmax(viterbi[s,T] for s=1 to N)
+ bestpath = the path starting at state bestpathpointer, that follows backpointer[] to states back in time
+ return bestpath, bestpathprob
+
+Note that the Viterbi algorithm is identical to the forward algorithm except that it takes the max over the previous path probabilities whereas the forward algorithm takes the sum.
+We turn to the third problem for HMMs: learning the parameters of an HMM, that is, the and matrices. Formally,
+Learning: Given an observation sequence and the set of possible states in the HMM, learn the HMM parameters and .
The input to such a learning algorithm would be an unlabeled sequence of observations and a vocabulary of potential hidden states . Thus, for the ice cream task, we would start with a sequence of observations and the set of hidden states and .
+The standard algorithm for HMM training is the forward-backward, or Baum-Welch algorithm, a special case of the Expectation-Maximization or EM algorithm.
+The algorithm will let us train both the transition probabilities and the emission probabilities of the HMM. EM is an iterative algorithm, computing an initial estimate for the probabilities, then using those estimates to computing a better estimate, and so on, iteratively improving the probabilities that it learns.
To understand the algorithm, we need to define a useful probability related to the forward probability and called the backward probability. The backward probability is the probability of seeing the observations from time to the end, given that we are in state at time (and given the automaton ):
+
+It is computed inductively in a similar manner to the forward algorithm.
Here is the pseudocode of this algorithm:
+function FORWARD_BACKWARD(ev, prior) returns a vector of probability distributions
+ inputs: ev, a vector of evidence values for steps 1,...,t
+ prior, the prior distribution on the initial state, P(X0)
+ local variables: fv, a vector of forward messages for steps 0,...,t
+ b, a representation of the backward message, initially all 1s
+ sv, a vector of smoothed estimates for steps 1,...,t
+ fv[0] = prior
+ for i = 1 to t do
+ fv[i] = FORWARD(fv[i − 1], ev[i])
+ for i = t downto 1 do
+ sv[i] = NORMALIZE(fv[i] * b)
+ b = BACKWARD(b, ev[i])
+ return sv
+
+Forward algorithm gives us a definite inference of the HMM. Similar to bayesian networks, we can have approximate inference too. Particle filtering is a sampling method to model and find an approximate inference of HMMs.
+Consider robot local localization problem. Assume that the map is and m is a very large number. Range of the belief vector would be . So, when we have a gigantic map (not to mention it could be continuous!), there is a gigantic belief vector that working with it may take a lot of time and resources. Apart of that, when we are working with a belief vector, after some steps and passage of time, it becomes extremely sparse (Lots of elements in the vector become very close to zero). This phenomenon will cause useless computations that ends up to zero every time. This is where a sampling method (e.a. Particle Sampling) comes handy.
+Consider robot localization problem. Let’s say we have particles. Each particle is a guess and hypothesis about where robot could be in that specific time. In fact, each particle is a sampled value of the stated of the problem (in this case of the robot in the map).
+This approach has three major steps: elapsing time, observing and resampling. These steps could be mapped to the Passage of time, observation and Normalization steps in forward algorithm respectively. The main idea of the algorithm is to keep hypothesis about in which state we are (in case of robot localization where the robot is) and update these hypothesis by passage of time and new observations, so, our guesses remain valid and strong about in which state we are. For better intuition, consider robot localization problem for the steps below.
+At the very beginning of the algorithm that we have no clue about the problem, we should (could) initial our particles to be uniformly spreaded in steps (robot could be everywhere with equal chances).
+At first, Similar to forward algorithm, we move our samples to new states by sampling over transition probabilities. The intuition about this step is that for each guess about the place of the robot, we guess another one about where it could be in the next step and use sampling over transition probability of that point on the map to create a new sample (particle) corresponding to the previous state (for each particle of course). Note that this transition could be deterministic too. At the end of this step, we have another set of guesses based on previous ones which is one step (in time) ahead of the previous ones. For each particle we do ( is the next state e.a. place in the map):
++
and will be our new particle in the set.
+Now the robot has new observations. We score every guess produced in the last steps by the new observation (give them weight) based on emission probability, which we have in HMMs, so, we know that how strong they are after new observation (similar to likelihood weighting). We give a weight to each particle by observing evidence :
++
Be aware that we dont sample anything here and particles are fixed. Also note that the probabilities won’t sum to one, as we are down-weighing almost every particle (some maybe very consistent with the evidence, and based on the approach of calculating the weight the can be one).
+Working with weights can be frustrating for our little robot (!) and some can converge to zero after some iterations, so, based on how probable and strong our particles were, we generate a new set of particles. This work is done be sampling over the weights of the particles times (so the size of the particle set remain the same). The stronger a particle is, the more probable it is to be sampled and be in the new particle set. After this step we have a new set of particle which are distributed by the strength of the particles, which were calculated in observation step, that keep the frequency of the samples strong and valid. And we will go back to the “Elapse Time” step.
+That’s all folks! First we have a set of particles. Based on where they are each, we guess where they would be in the next step ahead in time. An observation is done by the robot. We score (weight) the guesses to know how probabil after the observation. And resample based on weights, to normalize particles. and we repeat this steps again and again till we converge.
+![]() |
+
|---|
| An example of a full particle filtering process. | +
function PARTICLE_FILTERING(e, N, dbn) returns a set of samples for the next time step
+ inputs: e, the new incoming evidence
+ N, the number of samples to be maintained
+ dbn, a DBN with prior P(X0), transition model P(X1 | X0), sensor model P(E1 | X1)
+ persistent: S, a vector of samples of size N, initially generated from P(X0)
+ local variables: W, a vector of weights of size N
+
+ for i = 1 to N do
+ S[i] ← sample from P(X1 | X0 = S[i]) /* step 1 */
+ W[i] ← P(e | X1 = S[i]) /* step 2 */
+ S ← WEIGHTED_SAMPLE_WITH_REPLACEMENT(N, S, W) /* step 3 */
+ return S
+
+Here are two YouTube videos that explained the subject very well:
+ +Robot localization is the process of determining where a mobile robot is located with respect to its environment. Localization is one of the most fundamental competencies required by an autonomous robot as the knowledge of the robot’s own location is an essential precursor to making decisions about future actions. In a typical robot localization scenario, a map of the environment is available and the robot is equipped with sensors that observe the environment as well as monitor its own motion. The localization problem then becomes one of estimating the robot position and orientation within the map using information gathered from these sensors. Robot localization techniques need to be able to deal with noisy observations and generate not only an estimate of the robot location but also a measure of the uncertainty of the location estimate.
+Robot localization provides an answer to the question: Where is the robot now? A reliable solution to this question is required for performing useful tasks, as the knowledge of current location is essential for deciding what to do next. The problem then becomes one of estimating the robot pose (position and orientation) relative to the coordinate frame in which the map is defined. Typically, the information available for computing the robot location is gathered using onboard sensors, while the robot uses these sensors to observe its environment and its own motion. Given the space limitations, alternative scenarios where sensors such as surveillance cameras are placed in the environment to observe the robot or the robot is equipped with a receiver that provides an estimate of its location based on information from an external source (e.g., a Global Positioning System (GPS) that uses satellites orbiting the earth) are excluded from the following discussion.
+A mobile robot equipped with sensors to monitor its own motion (e.g., wheel encoders and inertial sensors) can compute an estimate of its location relative to where it started if a mathematical model of the motion is available. This is known as odometry or dead reckoning. The errors present in the sensor measurements and the motion model make robot location estimates obtained from dead reckoning more and more unreliable as the robot navigates in its environment. Errors in dead reckoning estimates can be corrected when the robot can observe its environment using sensors and is able to correlate the information gathered by these sensors with the information contained in a map.
+The formulation of the robot localization problem depends on the type of the map available as well as on the characteristics of the sensors used to observe its environment. In one possible formulation, the map contains locations of some prominent landmarks or features present in the environment and the robot is able to measure the range and/or bearing to these features relative to the robot. Alternatively, the map could be in the form of an occupancy grid that provides the occupied and free regions of an environment and the sensors on board the robot measures the distance to the nearest occupied region in a given direction. As the information from sensors is usually corrupted by noise, it is necessary to estimate not only the robot location but also the measure of the uncertainty associated with the location estimate. Knowledge of the reliability of the location estimate plays an important role in the decision-making processes used in mobile robots as catastrophic consequences may follow if decisions are made assuming that the location estimates are perfect when they are uncertain. Bayesian filtering is a powerful technique that could be applied to obtain an estimate of the robot location and the associated uncertainty.
+The localization problem in a landmark-based map is to find the robot pose at time as
+
+given the map, the sequence of robot actions , and sensor observations from time 1 to time .
+In its most fundamental form, the problem is to estimate the robot poses that best agree with all robot actions and all sensor observations. This can be formulated as a nonlinear least-squares problem using the motion and observation models derived in Section 2. The solution to the resulting optimization problem can then be calculated using an iterative scheme such as Gauss–Newton to obtain the robot trajectory and as a consequence the current robot pose. Appendix Appendix and Appendix Appendix provide the details on how both linear and nonlinear least-squares problems can be solved, and how the localization problem can be formulated as a nonlinear least-squares problem. The dimensionality of the problem is for two-dimensional motion, and given the sampling rate of modern sensors are on the order of tens of hertz, this strategy quickly becomes computationally intractable.
If the noises associated with the sensor measurements can be approximated using Gaussian distributions, and an initial estimate for the robot location at time 0, described using a Gaussian distribution with known , is available (in this article, is used to denote the estimated value of ), an approximate solution to this nonlinear least-squares problem can be obtained using an EKF. EKF effectively summarizes all the measurements obtained in the past in the estimate of the current robot location and its covariance matrix. When a new observation from the sensor becomes available, the current robot location estimate and its covariance are updated to reflect the new information gathered. Essential steps of the EKF-based localization algorithm are described in the following:
+
+Then the nonlinear process model (from time k to time ) as stated in equation 2 can be written in a compact form as
+
+where is the system transition function, uk is the control, and is the zero-mean Gaussian process noise .
+Consider the general case where more than one landmark is observed. Representing all the observations together as a single vector , and all the noises together as a single vector , the observation model at time as stated in equation 3 can also be written in a compact form as
+
+where is the observation function obtained from equation 3 and is the zero-mean Gaussian observation noise .
+Let the best estimate of at time be
+
+Then the localization problem becomes one of estimating at time :
+
+where are updated using the information gathered using the sensors. EKF framework achieves this as follows. To maintain clarity, only the basic equations are presented in the following, while Appendix Appendix provides a more detailed explanation.
+Predict using process model:
+
+
+where is the Jacobian of function with respect to , is the Jacobian of function f with respect to , both evaluated at .
+Update using observation:
+
+
+where the innovation covariance (here is called innovation) and the Kalman gain are given by
+
+
+where is the Jacobian of function h with respect to x evaluated at .
+Recursive application of the above equations every instant a new observation is gathered yields an updated estimate for the current robot location and its uncertainty. This recursive nature makes EKF the most computationally efficient algorithm available for robot localization.
+An important prerequisite for EKF-based localization is the ability to associate measurements obtained with specific landmarks present in the environment. Landmarks may be artificial, for example, laser reflectors, or natural geometric features present in the environment such as line segments, corners, or planes. In many cases, the observation itself does not contain any information as to which particular landmark is being observed. Data association is the process in which a decision is made as to the correspondence between an observation from the sensor and a particular landmark. Data association is critical to the operation of an EKF-based localizer, as catastrophic failure may result if data association decisions are incorrect.
+EKF relies on approximating the nonlinear motion and observation models using linear equations and that the sensor noises can be approximated using Gaussian distributions. These are reasonable assumptions under many practical conditions and therefore EKF is the obvious choice for solving the robot localization problem when the map of the environment consists of clearly identifiable landmarks.
Figure 2 shows the result of EKF localization for the simple problem given in Figure 1. The ground truth of the robot poses and the estimated robot poses are shown in red and blue, respectively. The 95% confidence ellipses obtained from the covariance matrices in the EKF estimation process are also shown in the figure.
+ +![]() |
+
|---|
| Figure 1 | +
![]() |
+
|---|
| Figure 2 | +
A Bayesian network is a snapshot of the system at a given time and is used to model systems that are in some kind of equilibrium state. Unfortunately, most systems in the world change over time and sometimes we are interested in how these systems evolve over time more than we are interested in their equilibrium states. Whenever the focus of our reasoning is change of a system over time, we need a tool that is capable of modeling dynamic systems.
+A dynamic Bayesian network (DBN) is a Bayesian network extended with additional mechanisms that are capable of modeling influences over time. The temporal extension of Bayesian networks does not mean that the network structure or parameters changes dynamically, but that a dynamic system is modeled. In other words, the underlying process, modeled by a DBN, is stationary. A DBN is a model of a stochastic process.
+Basic idea: ensure that the population of samples (“particles”) tracks the high-likelihood regions of the state-space Replicate particles proportional to likelihood for
+ +![]() |
+
|---|
| DBN Particle Filtering | +
Widely used for tracking nonlinear systems, esp. in vision. Also used for simultaneous localization and mapping in mobile robots dimensional state space.
+Assume consistent at time .
+Propagate forward: populations of are
+
+Weight samples by their likelihood for :
+
+Resample to obtain populations proportional to :
+
+Approximation error of particle filtering remains bounded over time, at least empirically—theoretical analysis is difficult.
![]() |
+
|---|
| Error of DBN particle filtering. | +
This note reviewed the key concepts of hidden Markov model for probabilistic sequence classification.
+[1] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. 4th ed. Pearson Education, Inc
+[2] Speech and Language Processing. Daniel Jurafsky & James H. Martin. https://web.stanford.edu/~jurafsky/slp3/A.pdf (Visited: 12/4/2021)
+[3] Science Direct Topics (Visited: 12/17/2021)
+[4] Cyrill Stachniss Youtube Channel (Visited: 17/4/2021)
+[5] Andreas Svensson Youtube Channel (Visited: 17/4/2021)
Chain Rule and HMMs
\n", - "Look at the bellow model.
\n", - "
From the chain rule, every joint distribution over can be written as:
\n", - "
And because we have bellow terms:
\n", - "
After simplifications we have:
\n", - "
We can see some of real HMM examples:
\n", - "\n", - "
Filtering/Monitoring
\n", - "First of all we define Bt(x) = P(Xt | E1, …, Et) as the belief state and it shows our prediction from next hidden variable according to our observations from start to now. we start from first belief state B0(X) in an initial setting (usually uniform) and as time passes or we get observations, we update the value of Bt(X). In other words we have a vector with lentgh of number of hidden variables and each cell of this vector has our prediction of it's real value. we name this task of tracking the distribution Bt(X) (actually B(X)) over time \"Flitering\" or \"Monitoring\".
\n", - "\n", - "
Example: Robot Localization
\n", - "Robot localization is the process of determining where a mobile robot is located with respect to its environment. Localization is one of the most fundamental competencies required by an autonomous robot as the knowledge of the robot's own location is an essential precursor to making decisions about future actions. In a typical robot localization scenario, a map of the environment is available and the robot is equipped with sensors that observe the environment as well as monitor its own motion. in this example sensor model can read in which directions there is a wall, never more than 1 mistake and motion model may not execute action with small probability. as mentioned earlier B0(X) assigned uniform.
\n", - "
Bellow tape shows the colour of each probability for all cells. for example in t=0 each cell has equal probability.
\n", - "
Cells those are compatible with our evidence from sensors has most probability to be the real place of robot. lighter grey cells are possible to get the reading, but less likely b/c required 1 mistake. white cells need more mistakes so theirs probability is near to zero.
\n", - "After skipping some states we have:
\n", - "
And then:
\n", - "
In this state the answer is approximately certain.
\n", - "\n", - "
Passage of Time
\n", - "If in the current state we have the belief B(Xt)=P(Xt|e1:t). then after one time step passes for P(Xt+1|e1:t) we have:
\n", - "
We know P(Xt+1|e1:t) isn't what we defined as Bt+1(X) so we name it B'(Xt+1) and we have:
\n", - "
We name the first part P(X'|xt) as transition and say beliefs get “pushed” through the transitions.
\n", - "\n", - "
Example: Passage of Time
\n", - "In this model as time passes, uncertainty about the answer accumulates and increases.
\n", - "
\n", - "
Observation
\n", - "Now we want to affect the observation in our prediction about next value of hidden variable.
\n", - "
And now in each state after observation we have:
\n", - "
We named the second part P(Xt+1|e1:t) as belief and say beliefs get “reweighted” by likelihood of evidence.
\n", - "** Unlike passage of time, we have to renormalize.
\n", - "\n", - "\n", - "
Example: Observation
\n", - "In this model as we get observations, beliefs get reweighted and uncertainty about the answer decreases.
\n", - "
\n", - "\n", - "
Example: Weather HMM
\n", - "In this example we want to predict the weather by looking our friend, is he come with umbrella or not. First day we use a uniform distribution but after first day, in each day we compute B' and then after observation of umbrella compute B for that day and do this for each day to decreasing the uncertainty.
\n", - "
\n", - "
The Forward Algorithm
\n", - "We are given evidence at each time and want to know:
\n", - "
We can derive the following updates:
\n", - "
We can normalize as we go if we want to have P(x|e) at each time step, or just once at the end. But which is better?
\n", - "\n", - "
Online Belief Updates
\n", - "Every time step, we start with current P(X | evidence)
\n", - "We update for time:
\n", - "
We update for evidence:
\n", - "
\n", - "
Particle Filtering
\n", - "In some problems |X| is too big for exact computing or even for storing B(X), so it's almost impossible to use previous algorithms. For example when X is continous. In this situations we must use approximate inference.
\n", - "In this algorithm we track just samples of X not all values and name this samples particles. Time per step is linear in the number of samples but number needed may be large enough and in memory should store list of particles not states.
\n", - "
Now represent P(X) by a list of N particles and P(x) approximate by number of particles with value of x. We know generally N<<|X| so many x may have p(x)=0 and this isn't good event. For solving this issue we must use more particles to achieve more accuracy.(For now assume all particles has the same weight)
\n", - "\n", - "
Elapse Time
\n", - "Each particle moved by transition model to it's next position.
\n", - "
Just like the prior sampling, each sample frequency reflect the transtition probabilities.
\n", - "As mentioned earlier for closing to exact values, must use enough samples.
\n", - "
\n", - "
Observe
\n", - "Just like the likelihood weighting, each sample's probabilities computed based on the evidence.(As before, the probabilities don’t sum to one, since all have been down weighted and need to normalizing)
\n", - "
\n", - "
Resample
\n", - "We use resampling (N times) intead of tracking weighted samples in this way choose from our weighted sample distribution (i.e. draw with replacement). This method is equivalent to renormalizing the distribution.
\n", - "And now the update is complete for this time step, continue with the next one.
\n", - "
\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/notebooks/11_temporal_probability_models/index3.ipynb b/notebooks/11_temporal_probability_models/index3.ipynb deleted file mode 100644 index 889b98a2..00000000 --- a/notebooks/11_temporal_probability_models/index3.ipynb +++ /dev/null @@ -1,225 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "placed-aquatic", - "metadata": {}, - "source": [ - "## Robot Localization\n", - "\n", - "In robot localization, we know the map, but not the robot’s position. An example of observations would be vectors of range finder readings, this means our agent has a couple of sensors, each reporting the distance in a specific direction with an obstacle. State space and readings are typically continuous (works basically like a very fine grid) and so we cannot store $B(X)$. Due to this property of problem, particle filtering is a main technique.\n", - "\n", - "So, we use many particles, uniformly distributed in the map. Then, after each iteration, we become reluctant to those of them that do not have probable readings. As a result, trusting that map would have been different to the eyes of our particles, we would end up with our particles centered at the real position.\n", - "\n", - "The below depiction shows this perfectly. The red dots represent particles. Notice how the algorithm can't decide between two positions until entering a room.\n", - "\n", - "What algorithm do you think would be better to drive the agent with, so that we can find and benefit from asymmetries in the map? (Think about random walks)\n" - ] - }, - { - "cell_type": "markdown", - "id": "prospective-muslim", - "metadata": {}, - "source": [ - "" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "id": "smart-number", - "metadata": {}, - "source": [ - "We can even even go a step further, and forget about the map. This problem is called **Simultaneous Localization And Mapping** or **SLAM** for short. In this version of problem, we neither do know where the agent is, nor know what the map is. We have to find them both.\n", - "\n", - "To solve this problem, we extend our states to also cover the map. For example, we can show our map with a matrix of 1s and 0s where every element is 1 if the map is blocked in the corresponding region on the map.\n", - "\n", - "To solve this problem we use Kalman filtering and particle methods.\n", - "\n", - "Notice how the robot starts with complete certainty about its position, and as the time goes on, it doubts if the position indeed is probable if it was a little bit away from its current position (like the readings would have been close to what they are now) and this leads to uncertainty even about the position. When the agent reachs a full cycle, it understands that it should be at the same position now, so its certainty about its position rises once again." - ] - }, - { - "cell_type": "markdown", - "id": "quick-phase", - "metadata": {}, - "source": [ - "## Dynamic Bayes Net\n", - "Dynamic Bayesian Networks (**DBN**) extend standard Bayesian networks with the concept of time. This allows us to model time series or sequences. In fact they can model complex multivariate time series, which means we can model the relationships between multiple time series in the same model, and also different regimes of behavior, since time series often behave differently in different contexts.\n", - "\n", - "\n", - "\n", - "### DBN Particle Filters\n", - "A particle is a complete sample for a time step. This is similar to reqgular filtering where we have to use sampling methods introduced earlier in the course instead of just a distribution.\n", - "\n", - "Below are the steps we have to follow:\n", - "* Initialize\n", - "\n", - "Generate prior samples for the $t=1$ Bayes net. e.g. particle $G_1^a = (3,3) G_1^b = (5,3)$ for above image.\n", - "\n", - "* Elapse time\n", - "\n", - "Sample a successor for each particle. e.g. successor $G_2^a = (2,3) G_2^b = (6,3)$\n", - "\n", - "* Observe\n", - "\n", - "Weight each entire sample by the likelihood of the evidence conditioned on the sample.\n", - "Likelihood $p(E_1^a |G_1^a) \\times p(E_1^b |G_1^b)$ \n", - "\n", - "* Resample\n", - "\n", - "Select prior samples (tuples of values) in proportion to their likelihood." - ] - }, - { - "cell_type": "markdown", - "id": "agricultural-tracker", - "metadata": {}, - "source": [ - "## Most Likely Explanation\n", - "\n", - "\n", - "We are introducing a new query, we can ask our temporal model. The query statement is as follows:\n", - "\n", - " What is the most likely path of states that would have produced the current result.\n", - "\n", - "Or more formally if our states are $X_i$ our observations are $E_i$, we want to find\n", - "\n", - "$$argmax_{x_{1:t}} P(x_{1:t}|e_{1:t})$$\n", - "\n", - "But how can we answer this query?\n", - "\n", - "First, let's define the **state trellis**.\n", - "\n", - "\n", - "\n", - "State trellis is a directed weighted graph $G$ that its nodes are the states, and an arc between two states $u$, and $v$ represents a transition between these two states. The weight of this arc is defined by the probablity of this arc happening. More formally, assume we have a transition between $x_{t-1}$ and $x_t$. Then the weight of the arc between these two will be $P(x_{t}|x_{t-1}) \\times P(e_t|x_t)$\n", - "\n", - "Note that with this definition, each path is a sequence of states, and the product of weights in this path is the probability of this path, provided the evidence.\n", - "\n", - "### Viterbi's Algorithm\n", - "Viterbi, uses dynamic programming model, to find the best path along the states. It first finds how probable a state at time $t-1$ is, and then reasons that the state at time $t$ relies solely on last state, and so having those probablities is enough to find the probability of new steps. Finally, the state that helps us find the most likely last state is it's parent.\n", - "\n", - "\\begin{align*}\n", - "m_t[x_t] &= max_{x_{1:t-1}} P(x_{1:t-1}, x_t, e_{1:t}) \\\\\n", - "&= P(e_t|x_t)max_{x_{t-1}} P(x_t|x_{t-1})m_{t-1}[x_{t-1}]\n", - "\\end{align*}\n", - "\n", - "$$p_t[x_t] = argmax_{x_{t-1}} P(x_t|x_{t-1})m_{t-1}[x_{t-1}]$$\n", - "\n", - "#### Example\n", - "Consider a village where all villagers are either healthy or have a fever and only the village doctor can determine whether each has a fever. The doctor diagnoses fever by asking patients how they feel. The villagers may only answer that they feel normal, dizzy, or cold.\n", - "\n", - "The doctor believes that the health condition of his patients operates as a discrete Markov chain. There are two states, \"Healthy\" and \"Fever\", but the doctor cannot observe them directly; they are hidden from him. On each day, there is a certain chance that the patient will tell the doctor he is \"normal\", \"cold\", or \"dizzy\", depending on his health condition.\n", - "\n", - "The observations (normal, cold, dizzy) along with a hidden state (healthy, fever) form a hidden Markov model (HMM).\n", - "\n", - "In this piece of code, start_p represents the doctor's belief about which state the HMM is in when the patient first visits (all he knows is that the patient tends to be healthy). The particular probability distribution used here is not the equilibrium one, which is (given the transition probabilities) approximately `{'Healthy': 0.57, 'Fever': 0.43}`. The transition_p represents the change of the health condition in the underlying Markov chain. In this example, there is only a 30% chance that tomorrow the patient will have a fever if he is healthy today. The emit_p represents how likely each possible observation, normal, cold, or dizzy is given their underlying condition, healthy or fever. If the patient is healthy, there is a 50% chance that he feels normal; if he has a fever, there is a 60% chance that he feels dizzy. \n", - "\n", - "\n", - "\n", - "The patient visits three days in a row and the doctor discovers that on the first day he feels normal, on the second day he feels cold, on the third day he feels dizzy. The doctor has a question: what is the most likely sequence of health conditions of the patient that would explain these observations?" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "brief-reconstruction", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " 0 1 2\n", - "Healthy: 0.30000 0.08400 0.00588\n", - "Fever: 0.04000 0.02700 0.01512\n", - "The steps of states are Healthy Healthy Fever with highest probability of 0.01512\n" - ] - } - ], - "source": [ - "obs = (\"normal\", \"cold\", \"dizzy\")\n", - "states = (\"Healthy\", \"Fever\")\n", - "start_p = {\"Healthy\": 0.6, \"Fever\": 0.4}\n", - "trans_p = {\n", - " \"Healthy\": {\"Healthy\": 0.7, \"Fever\": 0.3},\n", - " \"Fever\": {\"Healthy\": 0.4, \"Fever\": 0.6},\n", - "}\n", - "emit_p = {\n", - " \"Healthy\": {\"normal\": 0.5, \"cold\": 0.4, \"dizzy\": 0.1},\n", - " \"Fever\": {\"normal\": 0.1, \"cold\": 0.3, \"dizzy\": 0.6},\n", - "}\n", - "\n", - "def viterbi(obs, states, start_p, trans_p, emit_p):\n", - " V = [{}]\n", - " for st in states:\n", - " V[0][st] = {\"prob\": start_p[st] * emit_p[st][obs[0]], \"prev\": None}\n", - " # Run Viterbi when t > 0\n", - " for t in range(1, len(obs)):\n", - " V.append({})\n", - " for st in states:\n", - " max_tr_prob = V[t - 1][states[0]][\"prob\"] * trans_p[states[0]][st]\n", - " prev_st_selected = states[0]\n", - " for prev_st in states[1:]:\n", - " tr_prob = V[t - 1][prev_st][\"prob\"] * trans_p[prev_st][st]\n", - " if tr_prob > max_tr_prob:\n", - " max_tr_prob = tr_prob\n", - " prev_st_selected = prev_st\n", - "\n", - " max_prob = max_tr_prob * emit_p[st][obs[t]]\n", - " V[t][st] = {\"prob\": max_prob, \"prev\": prev_st_selected}\n", - "\n", - " for line in dptable(V):\n", - " print(line)\n", - "\n", - " opt = []\n", - " max_prob = 0.0\n", - " best_st = None\n", - " # Get most probable state and its backtrack\n", - " for st, data in V[-1].items():\n", - " if data[\"prob\"] > max_prob:\n", - " max_prob = data[\"prob\"]\n", - " best_st = st\n", - " opt.append(best_st)\n", - " previous = best_st\n", - "\n", - " # Follow the backtrack till the first observation\n", - " for t in range(len(V) - 2, -1, -1):\n", - " opt.insert(0, V[t + 1][previous][\"prev\"])\n", - " previous = V[t + 1][previous][\"prev\"]\n", - "\n", - " print (\"The steps of states are \" + \" \".join(opt) + \" with highest probability of %s\" % max_prob)\n", - "\n", - "def dptable(V):\n", - " # Print a table of steps from dictionary\n", - " yield \" \" * 5 + \" \".join((\"%3d\" % i) for i in range(len(V)))\n", - " for state in V[0]:\n", - " yield \"%.7s: \" % state + \" \".join(\"%.7s\" % (\"%lf\" % v[state][\"prob\"]) for v in V)\n", - "\n", - "viterbi(obs, states, start_p, trans_p, emit_p)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 5 -} diff --git a/notebooks/11_temporal_probability_models/index4.ipynb b/notebooks/11_temporal_probability_models/index4.ipynb deleted file mode 100644 index 1cd14852..00000000 --- a/notebooks/11_temporal_probability_models/index4.ipynb +++ /dev/null @@ -1,66 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
Forward/Viterbi Algorithm
\n", - "The Viterbi algorithm is a dynamic programming algorithm for obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states—called the Viterbi path—that results in a sequence of observed events, especially in the context of Markov information sources and hidden Markov models (HMM).
\n", - "\n", - "
And we have:
\n", - "
\n", - "
\n", - "There is a linear-time algorithm for finding the most likely sequence, but it requires a little more thought. It relies on the same Markov property that yielded efficient algorithms for filtering and smoothing. The easiest way to think about the problem is to view each sequence as a path through a graph whose nodes are the possible states at each time step. Now consider the task of finding the most likely path through this graph, where the likelihood of any path is the product of the transition probabilities along the path and the probabilities of the given observations at each state. Let’s focus in particular on paths that reach the state $Rain_5 = true$ Because of the Markov property, it follows that the most likely path to the state _$Rain_5 = true$_ consists of the most likely path to some state at time 4 followed by a transition to $Rain_5 = true$ ; and the state at time 4 that will become part of the path to $Rain_5 = true$ is whichever maximizes the likelihood of that path. In other words, there is a recursive relationship between most likely paths to each state xt+1 and most likely paths to each state $x_t$. We can write this relationship as an equation connecting the probabilities of the paths:\n", - "\n", - "


An example of HMM
\n", - "Suppose someone wants to spy on HTTPS connections and infer the sequence of browsing webpages. How he can do this? If he/she measures the sequence of sizes of packets coming in as noisy observations and define the contents of packets as hidden variables he/she can use the HMM Model to reach that goal!
\n", - "Transition model can be calculate via links on each webpage. In other words probability of choosing next webpage is related to the links in each webpage. That's mean we do random walk between webpages. After considering some tips such as dynamically generated content and user-specific content and ... we can run the HMM Algorithm to estimate probability distribution P(packet size| webpage).
\n", - "In the following chart we can see the error of this algorithm is around 10% (BoG) and it's so so shocking. nowadays Deep-Learning can decrease error to about 0%! Do you think you have secuirty?!
\n", - "
\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/notebooks/11_temporal_probability_models/metadata.yml b/notebooks/11_temporal_probability_models/metadata.yml index 18f0d5c1..ec8d2b52 100644 --- a/notebooks/11_temporal_probability_models/metadata.yml +++ b/notebooks/11_temporal_probability_models/metadata.yml @@ -1,38 +1,37 @@ -title: LN | Temporal Probability Models +title: Temporal Probability Models header: - title: Temporal Probability Models (Markov Models and Particle Filtering) + title: Temporal Probability Models (from filter/monitor till the end) + description: A comprehensive look at Temporal Probability Models and its applications authors: label: position: top text: Authors - kind: people content: - - name: Ali Mirzaei Saghezchi + - name: Mohammadreza Mofayezi role: Author contact: - icon: fab fa-github - link: https://github.com/mehrdad7008 + link: https://github.com/ckoorosh - icon: fas fa-envelope - link: mailto:mehrdad7008@gmail.com + link: mailto:mofayezi.m@gmail.com - - name: Arman Mohammadi + - name: Ali Hatami role: Author contact: + - icon: fab fa-github + link: https://github.com/alihatamitajik - icon: fas fa-envelope - link: mailto:rman.mo2000@gmail.com + link: mailto:a.hatam008@gmail.com - - name: Arman Babaei + - name: Pouria Momtaz role: Author contact: - icon: fab fa-github - link: https://github.com/arman17babaei - - icon: fas fa-envelope - link: mailto:292.arma@gmail.com - - - name: Mahdi Ghaznavi - role: Supervisor - contact: + link: https://github.com/pourya-momtaz - icon: fas fa-envelope - link: mailto:ghaznavi.mahdi@gmail.com + link: mailto:pouryamz19@gmail.com +comments: + label: false + kind: comments diff --git a/notebooks/11_temporal_probability_models/resource/dbn.png b/notebooks/11_temporal_probability_models/resource/dbn.png deleted file mode 100644 index b46b38fe..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/dbn.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/decorative_1.png b/notebooks/11_temporal_probability_models/resource/decorative_1.png deleted file mode 100644 index bfe5dcb2..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/decorative_1.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/health.png b/notebooks/11_temporal_probability_models/resource/health.png deleted file mode 100644 index 316cb87f..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/health.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/hmm-weather.png b/notebooks/11_temporal_probability_models/resource/hmm-weather.png deleted file mode 100644 index c4b524dd..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/hmm-weather.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/hmm.png b/notebooks/11_temporal_probability_models/resource/hmm.png deleted file mode 100644 index 659636a2..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/hmm.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/markov_chain.png b/notebooks/11_temporal_probability_models/resource/markov_chain.png deleted file mode 100644 index 722aed98..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/markov_chain.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/mini-forward.png b/notebooks/11_temporal_probability_models/resource/mini-forward.png deleted file mode 100644 index 6f539cb5..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/mini-forward.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/mle.png b/notebooks/11_temporal_probability_models/resource/mle.png deleted file mode 100644 index 03dafeb0..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/mle.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/represent_1.png b/notebooks/11_temporal_probability_models/resource/represent_1.png deleted file mode 100644 index c90e571c..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/represent_1.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/represent_2.png b/notebooks/11_temporal_probability_models/resource/represent_2.png deleted file mode 100644 index a5e4406e..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/represent_2.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/robot-localization.gif b/notebooks/11_temporal_probability_models/resource/robot-localization.gif deleted file mode 100644 index 8c5f4c63..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/robot-localization.gif and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/stationary-decorative.png b/notebooks/11_temporal_probability_models/resource/stationary-decorative.png deleted file mode 100644 index 7e3b1799..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/stationary-decorative.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/trellis.png b/notebooks/11_temporal_probability_models/resource/trellis.png deleted file mode 100644 index c67aec00..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/trellis.png and /dev/null differ diff --git a/notebooks/11_temporal_probability_models/resource/weather_example.png b/notebooks/11_temporal_probability_models/resource/weather_example.png deleted file mode 100644 index 6ee190f6..00000000 Binary files a/notebooks/11_temporal_probability_models/resource/weather_example.png and /dev/null differ