1: | Initialize |
2: | for episode = 1, M do |
3: | Initialize s |
4: | Repeat |
5: | Choose |
6: | Choose a from s using policy derived from |
7: | Take action a, observe |
8: |
|
9: |
|
10 | Train network using |
11: |
|
12: | until |
13: | end for |