Item

Do

1

Initialize: gn, Iterations, h, parameters r and v.

2

Evaluate the initial policy μ ˜ ( , v ) through (16).

3

Update the parameters r via (17)

4

Compute functions Q(×) by (19)

5

Update policy μ ¯ ( ) using (20)

6

Update the parameters v of μ ˜ ( , v ) by (22)

7

Update function gn.

8

Go to 2 and repeat items 2, 3, 4, 5, 6 and 7 until the complete the Iterations