1: | Initialize |
2: | for episode = 1, M do |
3: | Initialize s |
4: | Repeat |
5: | Choose |
6: | Choose a from s using policy derived from |
7: | Take action a, observe |
8: |
|
9: |
|
10: |
|
11: | until |
12: | end for |