Processing math: 100%

Algorithm 1 Quantile Regression Q Learning [5]

Require: N, κ

Input s,a,r,s, γϵ[0,1)

#Compute distributional Bellman target

Q(s,a):=jqjθj(s,a) where qj=1/N

a*argmaxaQ(s,a)

Tθjr+γθj(s,a*),j

#Compute quantile regression loss

Output Ni=1Ej[ρkˆτi(TθjQ(s,a))]