Algorithm 1 Quantile Regression Q Learning [5]

Require: N, κ

Input s , a , r , s , γ ϵ [ 0 , 1 )

#Compute distributional Bellman target

Q ( s , a ) : = j q j θ j ( s , a ) where q j = 1 / N

a * a r g m a x a Q ( s , a )

T θ j r + γ θ j ( s , a * ) , j

#Compute quantile regression loss

Output i = 1 N E j [ ρ τ ^ k i ( T θ j Q ( s , a ) ) ]