Algorithm 1 Quantile Regression Q Learning [5]
Require: N, κ
Input s , a , r , s ′ , γ ϵ [ 0 , 1 )
#Compute distributional Bellman target
Q ( s ′ , a ′ ) : = ∑ j q j θ j ( s ′ , a ′ ) where q j = 1 / N
a * ← a r g m a x a ′ Q ( s ′ , a ′ )
T θ j ← r + γ θ j ( s ′ , a * ) , ∀ j
#Compute quantile regression loss
Output ∑ i = 1 N E j [ ρ τ ^ k i ( T θ j − Q ( s , a ) ) ]