| Item | Do |
| 1 | Initialize: gn, Iterations, h, parameters r and v. |
| 2 | Evaluate the initial policy through (16). |
| 3 | Update the parameters r via (17) |
| 4 | Compute functions Q(×) by (19) |
| 5 | Update policy using (20) |
| 6 | Update the parameters v of by (22) |
| 7 | Update function gn. |
| 8 | Go to 2 and repeat items 2, 3, 4, 5, 6 and 7 until the complete the Iterations |