Mathematics, 30.03.2021 19:40 zafyafimli

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given samples of what an agent experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, we will first estimate the model (the transition function and the reward function), and then use the estimated model to find the optimal actions. To find the optimal actions, model-based RL proceeds by computing the optimal V or Q value function with respect to the estimated T and R. This could be done with any of value iteration, policy iteration, or Q-value iteration. Last week you already solved some exercises that involved value iteration and policy iteration, so we will go with Q value iteration in this exercise.
Consider the following samples that the agent encountered.
a a r S a S r S S r в 0.0 A -3.0 Clockwise B Clockwise Clockwise A C A 0.0 B 0.0 B 6.0 Clockwise Clockwise Clockwise A A 3.0 C A -3.0 B 0.0 в | 6.0 Clockwise B Clockwise Clockwise A. C A 3.0 C -10.0 A 0.0 Clockwise А B Clockwise Clockwise C 0.0 C-10.0 A 0.0 Clockwise Clockwise Clockwise А C A Counterclockwise C-8.0 B Counterclockwise A -10.0 C Counterclockwise B -8.0 A Counterclockwise C-8.0 B Counterclockwise A-10.0 C Counterclockwise B -8.0 C Counterclockwise B-8.0 B Counterclockwise A -10.0 A Counterclockwise B 0.0 A Counterclockwise B 0.0 B Counterclockwise A -10.0 C Counterclockwise A 0.0 B COunterclockwise C0.0 A Counterclockwise C-8.0 C Counterclockwise B-8.0
We start by estimating the transition function, T(s, a,s') and reward function R(s, a,s') for this MDP. Fill in the missing values in the following table for T(s, a,s') and R(s, a,s').
Discount Factor, y 0.5 s' T(S, a,s') R(S, a,s') S a Clockwise A M Clockwise A C P A Counterclockwise B 0.400 0.000 A Counterclockwise C 0.600 -8.000 Clockwise 0.800 -3.000 Clockwise 0.000 0.200 B Counterclockwise A 0.800 -10.000 B Counterclockwise C 0.200 0.000 Clockwise C A 0.600 0.000 Clockwise 0.400 6.000 C Counterclockwise A 0.200 0.000 C Counterclockwise B 0.800 -8.000 m

Answers: 2

Show answers

Another question on Mathematics

Mathematics, 21.06.2019 16:30

Anyone know? is appreciated! will mark brainliest if correct!

Answers: 2

Answer

Mathematics, 21.06.2019 19:00

45% of the trees in a park are apple trees. there are 27 apple trees in the park. how many trees are in the park in all?

Answers: 1

Answer

Mathematics, 21.06.2019 21:00

How do i put 3(x+7) in distributive property

Answers: 1

Answer

Mathematics, 21.06.2019 22:10

Asix-sided number cube is rolled twice. what is the probability that the first roll is an even numbe and the second roll is a number greater than 4?

Answers: 1

Answer

You know the right answer?

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...

Questions

Mathematics, 03.11.2019 12:31

Identify an equation in point-slope form for the line perpindicular to y=-1/2x+11 that passes through (4, -8)a.) y+8=2(x-4)b.) y-8=1...

History, 03.11.2019 12:31

During the industrial revolution, the people who probably suffered the most were a. politicians in government b. women in fashion c. managers of busin...

Mathematics, 03.11.2019 12:31

Whats the error? vannesa drew the angle at the right and named it < trs. explain why vanessa's name for the angle is incorrect. write a correct na...

Arts, 03.11.2019 12:31

Orlando is a wealthy business professional who is on vacation. he is visiting a country that has many distinct geographic regions. orlando has noticed...

History, 03.11.2019 12:31

Julius caesar’s assassination led to the a: founding of the roman republic b: beginning of rome’s expansion c:...

History, 03.11.2019 12:31

How did the national assembly react to the peasants uprising...

Mathematics, 03.11.2019 12:31

Apostcard is 4 inches shorter than an envelope. the envelope is 10 inches long. how long is the postcard? the diagram below could be used to solve th...

Chemistry, 03.11.2019 12:31

Do you think that creativity can contribute to the development of a scientific experiment or investigation ?...

Mathematics, 03.11.2019 12:31

Is triangle dca congruent to triangle bca? explain. (2 points)

Mathematics, 03.11.2019 12:31

In which figure is line de parallel to line bc? a)figure 1 b)figure 2 c)figure 3 d)figure 4

Business, 03.11.2019 12:31

How did president franklin roosevelt respond to the supreme court declaring some of his programs as unconstitutional? a. he forced a numb...

Mathematics, 03.11.2019 12:31

To achieve maximum coverage of the signal across earth, what type of line should be formed between the point where the satellite is located and the po...

History, 03.11.2019 12:31

The us constitution structures the government by creating...

Mathematics, 03.11.2019 12:31

Mrs. decker saw three witches land in a field where they found a pile of pumpkins and a ghost. they agree to sleep overnight in the field and divide u...

English, 03.11.2019 12:31

Which is the best statement of the theme to everything there is a season...

Mathematics, 03.11.2019 12:31

Atruck is carrying 10 cars weighing an average of 3,500 pounds each. what is the total weight in tons of the cars on the truck? the total weight of t...

History, 03.11.2019 12:31

Which development most influenced the spread of globalization in the 20th century...

English, 03.11.2019 12:31

In " compulsory voting: an idea whose time has come" what reason do opponents of compulsory voting give as to why it would not work in the united sta...

Social Studies, 03.11.2019 12:31

While congress has the power to create new laws, they can be vetoed by the president. in addition, the executive branch must enforce these laws. what...

Chemistry, 03.11.2019 12:31

How many moles are present in 454 grams of co2? a. 10.31 mol b. 4.54 mol c. 37.83 mol d. 14.18 mol e. 1.3 mol...

More questions: Mathematics Another questions