subject
Mathematics, 30.03.2021 19:40 zafyafimli

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given samples of what an agent experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, we will first estimate the model (the transition function and the reward function), and then use the estimated model to find the optimal actions. To find the optimal actions, model-based RL proceeds by computing the optimal V or Q value function with respect to the estimated T and R. This could be done with any of value iteration, policy iteration, or Q-value iteration. Last week you already solved some exercises that involved value iteration and policy iteration, so we will go with Q value iteration in this exercise.
Consider the following samples that the agent encountered.
a a r S a S r S S r в 0.0 A -3.0 Clockwise B Clockwise Clockwise A C A 0.0 B 0.0 B 6.0 Clockwise Clockwise Clockwise A A 3.0 C A -3.0 B 0.0 в | 6.0 Clockwise B Clockwise Clockwise A. C A 3.0 C -10.0 A 0.0 Clockwise А B Clockwise Clockwise C 0.0 C-10.0 A 0.0 Clockwise Clockwise Clockwise А C A Counterclockwise C-8.0 B Counterclockwise A -10.0 C Counterclockwise B -8.0 A Counterclockwise C-8.0 B Counterclockwise A-10.0 C Counterclockwise B -8.0 C Counterclockwise B-8.0 B Counterclockwise A -10.0 A Counterclockwise B 0.0 A Counterclockwise B 0.0 B Counterclockwise A -10.0 C Counterclockwise A 0.0 B COunterclockwise C0.0 A Counterclockwise C-8.0 C Counterclockwise B-8.0
We start by estimating the transition function, T(s, a,s') and reward function R(s, a,s') for this MDP. Fill in the missing values in the following table for T(s, a,s') and R(s, a,s').
Discount Factor, y 0.5 s' T(S, a,s') R(S, a,s') S a Clockwise A M Clockwise A C P A Counterclockwise B 0.400 0.000 A Counterclockwise C 0.600 -8.000 Clockwise 0.800 -3.000 Clockwise 0.000 0.200 B Counterclockwise A 0.800 -10.000 B Counterclockwise C 0.200 0.000 Clockwise C A 0.600 0.000 Clockwise 0.400 6.000 C Counterclockwise A 0.200 0.000 C Counterclockwise B 0.800 -8.000 m

ansver
Answers: 2

Another question on Mathematics

question
Mathematics, 21.06.2019 16:30
Anyone know? is appreciated! will mark brainliest if correct!
Answers: 2
question
Mathematics, 21.06.2019 19:00
45% of the trees in a park are apple trees. there are 27 apple trees in the park. how many trees are in the park in all?
Answers: 1
question
Mathematics, 21.06.2019 21:00
How do i put 3(x+7) in distributive property
Answers: 1
question
Mathematics, 21.06.2019 22:10
Asix-sided number cube is rolled twice. what is the probability that the first roll is an even numbe and the second roll is a number greater than 4?
Answers: 1
You know the right answer?
Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...
Questions
Questions on the website: 13722362