Algorithm 1 Quantile Regression Q Learning [5] |
Require: N, κ Input s,a,r,s′, γϵ[0,1) #Compute distributional Bellman target Q(s′,a′):=∑jqjθj(s′,a′) where qj=1/N a*←argmaxa′Q(s′,a′) Tθj←r+γθj(s′,a*),∀j #Compute quantile regression loss Output ∑Ni=1Ej[ρkˆτi(Tθj−Q(s,a))] |