0CCS0CSE Programming,Help With R，Java，Python Programming Web| Web

Introduction to CS & Engineering (0CCS0CSE)
Assignment 23: Episode
1 Value Function
Implementing Eq. 1 can cause confusion because V (S) is on both sides of the equation and
in Python V (S) is a dictionary. This document will help explain lines 23−25 in Algorithm 1.
V (St) = V (St) + α[Rt+1 + γV (St+1) − V (St)] (1)
Although lines 23 and 24 appear to update the valueFunction dictionary in Algorithm 1,
they do not. Lines 23 and 24 are retrieve information from the value function dictionary.
The introduction of two new variables, v st1 and v st0, to replace V (St+1) and V (St), would
help to clarify that only line 25 changes the dictionary.
v st1 ⇐ GetValueOf(board)
v st0 ⇐ GetValueOf(previousState)
V (St) ⇐ v st0+session.learningRate×(reward+(session.discountRate×v st1)−v st0)
Furthermore, GetValueOf(...) is a multistep process (1) get the key from the board (2)
check if the key is in valueFunction, either i. the key is in valueFunction —return the
value associated with the key in the dictionary, e.g., return self.valueFunction[key] or
ii. the key is not in valueFunction —add the key to the dictionary, initialise its value
to zero and return 0. It would be best to add a new method, getValueOf(self, board),
which does all of this. In Algorithm 1, lines 23 and 24, both board and previousState are
TicTacToe objects.
1
Algorithm 1 This method executes a single tictactoe game and updates the state value
table after every move played by the RL agent.
1: procedure episode(board, opponent, session)
2:
3: result ⇐ True
4: turn ⇐ 0
5: previousState ⇐ CopyBoard()
6:
7: while not board.isGameOver() and result do
8: if turn > 1 then :
9: turn ⇐ 0
10: end if
11:
12: agentMoved ⇐ False
13:
14: if turn is 0 and session.agentFirst or turn is 1 and not session.agentFirst then
15: result ⇐ makeTrainingMove(board, session.epsilon)
16: agentMoved ⇐ True
17: else
18: result ⇐ opponent.makeMove(board)
19: end if
20:
21: if agentMoved then
22: reward ⇐ getReward(board)
23: V (St+1) ⇐ GetValueOf(board)
24: V (St) ⇐ GetValueOf(previousState)
25: V (St) ⇐ V (St) +session.learningRate ×(reward + (session.discountRate ×
V (St+1)) − V (St))
26: previousState ⇐ CopyBoard()
27:
28: end if
29:
30: turn ⇐ turn + 1
31: end while
32:
33: reward ⇐ getReward(board)
34: V (St+1) ⇐ GetValueOf(board)
35: V (St+1) ⇐= V (St+1) + session.learningRate ∗ reward
36: end procedure

QQ：99515681
WeChat：codinghelp
Email：99515681@qq.com
Work Time：8:00-23:00

Hots

Ghostwriter Cs1b Spring 2024 Tth Hw08h... 2024-04-19
Help With Managing Financial Risk Prob... 2024-04-19
Ghostwriter Cs 0449 – Project 5: /Dev/ 2024-04-19
Ghostwriter Elec 2141 Digital Circuit ... 2024-04-19
Help With Csc171 — Videogame Projecthe 2024-04-19
Help With Comp3411 Artificial Intellig 2024-04-19
Help With Stat3061: Random Processes &... 2024-04-19
Ghostwriter Accounting 452, Spring 202... 2024-04-19
Ghostwriter Finc5001 Foundations In Fi... 2024-04-19
Ghostwriter 7Ssmm712 – Topics In Appli 2024-04-19
Help With Com 337 - Film Studies For T... 2024-04-19
Ghostwriter Mes202tc - Digital Vlsi Sy... 2024-04-19
Ghostwriter Geography 2041B Distance S... 2024-04-19
Ghostwriter Ecos3006 International Tra... 2024-04-19
Help With Fit5225 2024 Sm1 Creating An... 2024-04-19
Help With Cit 593: Introduction To Com... 2024-04-19
Help With Math 4931: Take Home Examgho... 2024-04-19
Ghostwriter Csci 547|Info 533: Systems... 2024-04-19
Ghostwriter Cs536-S24 Intro To Pls And... 2024-04-19
Help With Fit5212 - Assignment 1Ghostw... 2024-04-19

Programming Assignment Help！