V
(
s
)
=
R
(
s
)
+
γ
∑
s
′
∈
S
p
(
s
′
∣
s
)
V
(
s
′
)
V(s)=R(s)+\gamma \sum_{s^{\prime} \in S} p\left(s^{\prime} \mid s\right) V\left(s^{\prime}\right)
V(s)=R(s)+γ∑s′∈Sp(s′∣s)V(s′)
Q
(
s
,
a
)
=
R
(
s
,
a
)
+
γ
∑
s
′
∈
S
p
(
s
′
∣
s
,
a
)
V
(
s
′
)
Q(s,a)=R(s, a)+\gamma \sum_{s^{\prime} \in S} p\left(s^{\prime} \mid s, a\right) V\left(s^{\prime}\right)
Q(s,a)=R(s,a)+γ∑s′∈Sp(s′∣s,a)V(s′)
状态价值函数
V
π
(
s
)
=
E
π
[
G
t
∣
s
t
=
s
]
V_{\pi}(s)=\mathbb{E}_{\pi}\left[G_{t} \mid s_{t}=s\right]
Vπ(s)=Eπ[Gt∣st=s],动作价值函数
Q
π
(
s
,
a
)
=
E
π
[
G
t
∣
s
t
=
s
,
a
t
=
a
]
Q_{\pi}(s, a)=\mathbb{E}_{\pi}\left[G_{t} \mid s_{t}=s, a_{t}=a\right]
Qπ(s,a)=Eπ[Gt∣st=s,at=a]
V
π
(
s
)
=
E
π
[
r
t
+
1
+
γ
V
π
(
s
t
+
1
)
∣
s
t
=
s
]
V_{\pi}(s)=\mathbb{E}_{\pi}\left[r_{t+1}+\gamma V_{\pi}\left(s_{t+1}\right) \mid s_{t}=s\right]
Vπ(s)=Eπ[rt+1+γVπ(st+1)∣st=s]
Q
π
(
s
,
a
)
=
E
π
[
r
t
+
1
+
γ
Q
π
(
s
t
+
1
,
a
t
+
1
)
∣
s
t
=
s
,
a
t
=
a
]
Q_{\pi}(s, a)=\mathbb{E}_{\pi}\left[r_{t+1}+\gamma Q_{\pi}\left(s_{t+1}, a_{t+1}\right) \mid s_{t}=s, a_{t}=a\right]
Qπ(s,a)=Eπ[rt+1+γQπ(st+1,at+1)∣st=s,at=a]
版权声明:本文来源CSDN,感谢博主原创文章,遵循 CC 4.0 by-sa 版权协议,转载请附上原文出处链接和本声明。
原文链接:https://blog.csdn.net/qq_41936559/article/details/123611224
站方申明:本站部分内容来自社区用户分享,若涉及侵权,请联系站方删除。