subject

Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not know the transition function or the reward function for the MDP, but instead, we are given with samples of what an agent actually experiences when it interacts with the environment (although, we do know that we do not remain in the same state after taking an action). In this problem, instead of first estimating the transition and reward functions, we will directly estimate the Q function using Q-learning. Assume, the discount factor, γ is 0.5 and the step size for Q-learning, α is 0.5. Our current Q function, Q(s, a), is shown in the left figure. The agent encounters the samples shown in the right figure: s A B a s' с r Clockwise 1.501 -0.451 2.73 A Counterclockwise C 8.0 Counterclockwise 3.153-6.055 2.133 Counterclockwise A 0.0
Provide the Q-values for all pairs of (state, action) after both samples have been accounted for.

ansver
Answers: 3

Another question on Computers and Technology

question
Computers and Technology, 22.06.2019 15:10
David is in week 3 of his current ashford course and has a paper due by monday night at midnight. he has finished everything but the concluding paragraph. as he boots up his computer to work on it, he sees a flash across the screen and then the screen goes black. he begins to panic as he tries desperately to turn the laptop back on. david should have saved his work on what kind of portable device?
Answers: 2
question
Computers and Technology, 23.06.2019 11:00
What is the name of the sound effect that danny hears
Answers: 1
question
Computers and Technology, 23.06.2019 17:00
1. which of the following is not an example of an objective question? a. multiple choice. b. essay. c. true/false. d. matching 2. why is it important to recognize the key word in the essay question? a. it will provide the answer to the essay. b. it will show you a friend's answer. c. it will provide you time to look for the answer. d. it will guide you on which kind of answer is required.
Answers: 1
question
Computers and Technology, 23.06.2019 22:30
Apart from confidential information, what other information does nda to outline? ndas not only outline confidential information, but they also enable you to outline .
Answers: 1
You know the right answer?
Consider an MDP with 3 states, A, B and C; and 2 actions Clockwise and Counterclockwise. We do not k...
Questions
question
Mathematics, 05.02.2021 01:40
question
Arts, 05.02.2021 01:40
question
Health, 05.02.2021 01:40
question
Mathematics, 05.02.2021 01:40
Questions on the website: 13722361