Reinforcement Learning
+Table of contents
+Policy Evaluation
- +- The intensity of the blue bar shows the influence of the state value in the weighted average
- The state value history gets visualized as a bar chart in each state @@ -44,7 +48,7 @@
- Red indicates negative state value
- Dark gray indicates the terminal states
Policy Evaluation
Solving Problems By Searching
- -Node Expansion
+Table of contents
+-
+
- Node Expansion +
- Getting in the shoes of a Search Agent +
- Breadth First Search +
- Depth First Search +
- Step Costs +
- Uniform Cost Search +
- Depth Limited Search +
- Iterative Deepening Depth-First Search +
- Bi-directional BFS +
- A Star Search +
Node Expansion
To search through a graph, a Search Agent needs to expand nodes. The nodes which can be expanded by the Agent together forms the frontier. Expanding a node refers to marking the node as 'expanded' or 'visited' and adding its immediate neighbors to the frontier. @@ -82,7 +94,7 @@
Frontier Nodes
Getting in the shoes of a Search Agent
+Getting in the shoes of a Search Agent
Let's see the prespective of a Search Agent as it searches through the graph.
Remember, the Agent can only see the nodes which are either expanded or currently present in the frontier.
@@ -104,7 +116,7 @@
Getting in the shoes of a Search Agent
Breadth First Search
+Breadth First Search
In Breadth First Search, the node which was discovered the earliest is expanded next i.e. the node which joined the frontier earlier, is expanded earlier.
To achieve this, Breadth First Search Algorithm uses a FIFO(First In First Out) Queue. The following graph is explored by a Breadth First Search Algorithm with 'A' as the initial node. @@ -140,7 +152,7 @@
FIFO Queue
Depth First Search
+Depth First Search
In Depth First Search, the node which was discovered the latest is expanded next i.e. the node which joined the frontier later, is expanded later.
To achieve this, Depth First Search Algorithm uses a LIFO(Last In First Out) Queue. The following graph is explored by a Depth First Search Algorithm with 'A' as the initial node. @@ -177,7 +189,7 @@
LIFO Queue
Step Costs
+Step Costs
Until now, all the edges in our graph had the same cost.(That's why we didn't bother to mention the cost on the graph). For those kind of graphs, Breadth First Search is optimal because it always pops the shallowest node first. For the case when some step costs are involved when exploring the nodes, the BFS algorithm needs be extended. @@ -231,7 +243,7 @@
Uniform Cost Search (Extension of BFS)
Uniform Cost Search
+Uniform Cost Search
For Unifrom Cost Search, instead of using a simple LIFO queue, A priority Queue is used where the cost of reaching that node from the initial node is considered as its priority. On each iteration, the node with the smallest cost is extracted from the frontier for expansion. @@ -290,7 +302,7 @@
costs <= 0
- Depth Limited Search
+ Depth Limited Search
The Depth Limited Search is the same as Depth First Search except that there is an upper limit to the depth of the nodes which the algorithm traverses. The nodes which have depths greater than this limit is not expanded by the Depth Limited Search.
@@ -331,7 +343,7 @@ Depth Limit :
- Iterative Deepening Depth-First Search
+ Iterative Deepening Depth-First Search
Iterative Deepening Depth-First Search is a general strategy that is used to find the best depth limit. It does this by applying Depth Limited Search to the given problem with increasing depth limit. (0, 1, 2, 3 and so on.)
@@ -370,7 +382,7 @@ Depth Limit :
- Bi-directional BFS
+ Bi-directional BFS
In bi-directional BFS, we run two simultaneous searches, one from the initial state and one from the goal state. We stop when these searches meet in the middle (ie, a node is explored by both the BFS)
The motivation behind this is that bd/2 + bd/2 is smaller than bd
In the diagram below, we compare the performance between bi-directional BFS and standard BFS side by side. The total number of nodes generated for each strategy is given below the diagram.
@@ -394,7 +406,7 @@
- A Star Search
+ A Star Search
Click on the canvas to restart the simulation
diff --git a/4-Beyond-Classical-Search/index.html b/4-Beyond-Classical-Search/index.html
index d272877..f857c4b 100644
--- a/4-Beyond-Classical-Search/index.html
+++ b/4-Beyond-Classical-Search/index.html
@@ -33,8 +33,21 @@
Beyond classical search
-
- Optimization Problem
+ Table of contents
+
+ - Node Expansion
+ - Hill Climbing Search
+ - Simulated Annealing
+ - Genetic Algorithm
+ - Searching with non-deterministic actions
+ - he erratic vacuum world
+ - Searching with Partial Observations
+ - Vacuum World with no observation
+ - And-Or-Graph-Search
+ - Online DFS Agent
+ - LRTA*-Agent
+
+ Optimization Problem
In many optimization problems, the path to the solution is irrelevant. In pure optimization problems, the best state is defined by the objective function. To represent such problems, a state-space landscape is used. It has a location (state)
represented by x-axis and elevation(objective function value) represented by y-axis. The best state is hence the state with the highest objective value
The given diagram is a state-space representation of an objective function. You can click anywhere inside the box to reveal the elevation there. You are allowed 25 moves to find the highest peak before the hill is revealed.
@@ -51,7 +64,7 @@
- Hill Climbing Search
+ Hill Climbing Search
In hill climbing search, the current node is replaced by the best neighbor. In this case, the objective function is represented by elevation, neighbors of a state are the states to the left and right of it and the best neigbor is the neigbor state
with the highest elevation.
The represents global maximas and represents the states from where the hill climbing search can reach a global maxima.
@@ -65,7 +78,7 @@ Hill Climbing Search
- Simulated Annealing
+ Simulated Annealing
Simulated Annealing is a combination of Hill Climbing and Random Walk to gain more efficiency and completeness. In this procedure, instead of always moving to the best neighbor, a random neighbor is chosen. If the new state has better objective
value than the current state, it is always chosen. If not, the algorithm accept the new state with a probability less than one. The probability of choosing a bad state depends on :
@@ -88,16 +101,16 @@ Simulated Annealing
- Genetic Algorithm
+ Genetic Algorithm
Little critters change the color of their fur to match the background to camouflage themselves from predators.
Click on the canvas to generate next generation. Keep clicking to generate another progeny.
Note: Single point crossover might not be suitable for all applications
- Searching with non-deterministic actions
+ Searching with non-deterministic actions
In a world with non-deterministic actions, the result of an action is not known with complete certainty.
- The erratic vacuum world
+ The erratic vacuum world
The erratic vacuum world is an extension of vacuum world from Chapter 2. In this world, the behavior of the cleaner is non-deterministic.
To define this behavior more formally, in this world, the Suck action works as follows
@@ -153,8 +166,9 @@ Moves
- Searching with Partial Observations
Now we come back to a world where the actions of the robot are deterministic again (no erratic behavior like before) but, the robot no longer has complete sense of its current state or its environment.
- Vacuum World with no observation
+ Searching with Partial Observations
+ Now we come back to a world where the actions of the robot are deterministic again (no erratic behavior like before) but, the robot no longer has complete sense of its current state or its environment.
+ Vacuum World with no observation
In this world, the vacuum cleaner has no idea initially about its own location and the location of dirt in the world. Since the robot has no percept, it should be able to figure out a sequence of actions that will work despite its current state.
Given below are 8 random initial states. You can record a sequence of actions and see it in action just like before. Assume that illegal moves (like moving right in the right-most tile) have no effect on the world.
Try to find a sequence of actions that will lead to a final state (Clean all the dirt), no matter what the initial state of the world.
@@ -201,18 +215,18 @@ Moves
- And-Or-Graph-Search
+ And-Or-Graph-Search
- Online DFS Agent
+ Online DFS Agent
Click to reset. Green tile is destination.
- LRTA*-Agent
+ LRTA*-Agent
diff --git a/5-Adversarial-Search/index.html b/5-Adversarial-Search/index.html
index 3bb9c92..1f7a6a5 100644
--- a/5-Adversarial-Search/index.html
+++ b/5-Adversarial-Search/index.html
@@ -5,7 +5,7 @@
-
+
@@ -17,15 +17,19 @@
Adversarial Search
-
- Minimax
+ Table of contents
+
+ - Minimax
+ - Alpha Beta Pruning
+
+ Minimax
Click on the screen to restart the simulation.
- Alpha Beta Pruning
+ Alpha Beta Pruning
Click on the screen to restart the simulation.
diff --git a/6-Constraint-Satisfaction-Problems/index.html b/6-Constraint-Satisfaction-Problems/index.html
index 8a4ef5b..dc10e39 100644
--- a/6-Constraint-Satisfaction-Problems/index.html
+++ b/6-Constraint-Satisfaction-Problems/index.html
@@ -31,7 +31,16 @@
Constraint Satisfaction Problems
- Defining CSP with Map Coloring Problem
+ Table of contents
+
+ - Defining CSP with Map Coloring Problem
+ - Arc consistency
+ - Sudoku Example of CSP
+ - Backtracking Search
+ - Min Conflicts
+ - Tree CSP
+
+ Defining CSP with Map Coloring Problem
A map coloring problem is a type of CSP where each state can be assigned a color from the set (red,green,blue). The constraint involved says that no two neighbouring state is allowed to have the same color.
@@ -54,7 +63,7 @@ Defining CSP with Map Coloring Problem
- Arc consistency
+ Arc consistency
- Y = X + 1
@@ -67,7 +76,7 @@ Arc consistency
- Sudoku Example of CSP
+ Sudoku Example of CSP
All sudoku puzzles can be forumulated as CSP by considering each cell as a variable. The initial domain of all cells is {1,2,3,4,5,6,7,8,9}.
The constraints are formulated by the fact that in the solution of a sudoku puzzle, no two cell in a row, column or block can have identical numbers. Thus, there is an
AllDiff( ) constraint for all the rows, columns and blocks.
@@ -99,7 +108,7 @@ Sudoku Example of CSP
- Backtracking Search
+ Backtracking Search
Backtracking Search is a type of Search algorithm that is useful to solve CSPs. It interweaves inferences and search to arrive at a solution by pruning parts of the search tree.
It repeatedly chooses an unassigned variable and tries all the possible values for it. If any inconsistency is detected, it backtracks and tries another value for the previous assignment.
In the diagram below, order of selection of variable as well as the order of values for the variable is chosen randomly. Use the Next button to move forward and Previous button to go back. You can also use the slider to go to any state
@@ -132,13 +141,13 @@
Assignments :
- Min Conflicts
+ Min Conflicts
Min conflicts stuff
- Tree CSP
+ Tree CSP
Tree CSP Stuff
Depth Limited Search
+Depth Limited Search
The Depth Limited Search is the same as Depth First Search except that there is an upper limit to the depth of the nodes which the algorithm traverses. The nodes which have depths greater than this limit is not expanded by the Depth Limited Search.
@@ -331,7 +343,7 @@Depth Limit :
Iterative Deepening Depth-First Search
+Iterative Deepening Depth-First Search
Iterative Deepening Depth-First Search is a general strategy that is used to find the best depth limit. It does this by applying Depth Limited Search to the given problem with increasing depth limit. (0, 1, 2, 3 and so on.)
@@ -370,7 +382,7 @@Depth Limit :
Bi-directional BFS
+Bi-directional BFS
In bi-directional BFS, we run two simultaneous searches, one from the initial state and one from the goal state. We stop when these searches meet in the middle (ie, a node is explored by both the BFS)
The motivation behind this is that bd/2 + bd/2 is smaller than bd
In the diagram below, we compare the performance between bi-directional BFS and standard BFS side by side. The total number of nodes generated for each strategy is given below the diagram.
@@ -394,7 +406,7 @@Beyond classical search
- -Optimization Problem
+Table of contents
+-
+
- Node Expansion +
- Hill Climbing Search +
- Simulated Annealing +
- Genetic Algorithm +
- Searching with non-deterministic actions +
- he erratic vacuum world +
- Searching with Partial Observations +
- Vacuum World with no observation +
- And-Or-Graph-Search +
- Online DFS Agent +
- LRTA*-Agent +
Optimization Problem
In many optimization problems, the path to the solution is irrelevant. In pure optimization problems, the best state is defined by the objective function. To represent such problems, a state-space landscape is used. It has a location (state) represented by x-axis and elevation(objective function value) represented by y-axis. The best state is hence the state with the highest objective value
The given diagram is a state-space representation of an objective function. You can click anywhere inside the box to reveal the elevation there. You are allowed 25 moves to find the highest peak before the hill is revealed.
@@ -51,7 +64,7 @@Hill Climbing Search
+Hill Climbing Search
In hill climbing search, the current node is replaced by the best neighbor. In this case, the objective function is represented by elevation, neighbors of a state are the states to the left and right of it and the best neigbor is the neigbor state with the highest elevation.
The represents global maximas and represents the states from where the hill climbing search can reach a global maxima.
@@ -65,7 +78,7 @@Hill Climbing Search
Simulated Annealing
+Simulated Annealing
Simulated Annealing is a combination of Hill Climbing and Random Walk to gain more efficiency and completeness. In this procedure, instead of always moving to the best neighbor, a random neighbor is chosen. If the new state has better objective value than the current state, it is always chosen. If not, the algorithm accept the new state with a probability less than one. The probability of choosing a bad state depends on :
-
@@ -88,16 +101,16 @@
Simulated Annealing
-Genetic Algorithm
+Genetic Algorithm
Little critters change the color of their fur to match the background to camouflage themselves from predators.
Click on the canvas to generate next generation. Keep clicking to generate another progeny.
Note: Single point crossover might not be suitable for all applications
Searching with non-deterministic actions
+Searching with non-deterministic actions
In a world with non-deterministic actions, the result of an action is not known with complete certainty.
-The erratic vacuum world
+The erratic vacuum world
The erratic vacuum world is an extension of vacuum world from Chapter 2. In this world, the behavior of the cleaner is non-deterministic.
To define this behavior more formally, in this world, the Suck action works as follows
-
@@ -153,8 +166,9 @@
Moves
-Searching with Partial Observations
Now we come back to a world where the actions of the robot are deterministic again (no erratic behavior like before) but, the robot no longer has complete sense of its current state or its environment. -Vacuum World with no observation
+Searching with Partial Observations
+Now we come back to a world where the actions of the robot are deterministic again (no erratic behavior like before) but, the robot no longer has complete sense of its current state or its environment.
+Vacuum World with no observation
In this world, the vacuum cleaner has no idea initially about its own location and the location of dirt in the world. Since the robot has no percept, it should be able to figure out a sequence of actions that will work despite its current state.
Given below are 8 random initial states. You can record a sequence of actions and see it in action just like before. Assume that illegal moves (like moving right in the right-most tile) have no effect on the world.
Try to find a sequence of actions that will lead to a final state (Clean all the dirt), no matter what the initial state of the world.
@@ -201,18 +215,18 @@Moves
And-Or-Graph-Search
+And-Or-Graph-Search
Online DFS Agent
+Online DFS Agent
Click to reset. Green tile is destination.
LRTA*-Agent
+LRTA*-Agent
Adversarial Search
- -Minimax
+Table of contents
+-
+
- Minimax +
- Alpha Beta Pruning +
Minimax
Click on the screen to restart the simulation.
Alpha Beta Pruning
+Alpha Beta Pruning
Click on the screen to restart the simulation.
diff --git a/6-Constraint-Satisfaction-Problems/index.html b/6-Constraint-Satisfaction-Problems/index.html index 8a4ef5b..dc10e39 100644 --- a/6-Constraint-Satisfaction-Problems/index.html +++ b/6-Constraint-Satisfaction-Problems/index.html @@ -31,7 +31,16 @@Constraint Satisfaction Problems
-Defining CSP with Map Coloring Problem
+Table of contents
+-
+
- Defining CSP with Map Coloring Problem +
- Arc consistency +
- Sudoku Example of CSP +
- Backtracking Search +
- Min Conflicts +
- Tree CSP +
Defining CSP with Map Coloring Problem
A map coloring problem is a type of CSP where each state can be assigned a color from the set (red,green,blue). The constraint involved says that no two neighbouring state is allowed to have the same color.
@@ -54,7 +63,7 @@Defining CSP with Map Coloring Problem
-Arc consistency
+Arc consistency
- Y = X + 1 @@ -67,7 +76,7 @@
Arc consistency
Sudoku Example of CSP
+Sudoku Example of CSP
All sudoku puzzles can be forumulated as CSP by considering each cell as a variable. The initial domain of all cells is {1,2,3,4,5,6,7,8,9}.
The constraints are formulated by the fact that in the solution of a sudoku puzzle, no two cell in a row, column or block can have identical numbers. Thus, there is an
AllDiff( ) constraint for all the rows, columns and blocks.
Sudoku Example of CSP
Backtracking Search
+Backtracking Search
Backtracking Search is a type of Search algorithm that is useful to solve CSPs. It interweaves inferences and search to arrive at a solution by pruning parts of the search tree.
It repeatedly chooses an unassigned variable and tries all the possible values for it. If any inconsistency is detected, it backtracks and tries another value for the previous assignment.
In the diagram below, order of selection of variable as well as the order of values for the variable is chosen randomly. Use the Next button to move forward and Previous button to go back. You can also use the slider to go to any state @@ -132,13 +141,13 @@
Assignments :
Min Conflicts
+Min Conflicts
Min conflicts stuff
Tree CSP
+Tree CSP
Tree CSP Stuff