Initialize Q values |
Repeat t times (t = number of learning episodes) |
Select a random state s |
Repeat until the end of the learning episode |
Select an action a |
Receive an immediate reward r |
Observe the next state |
Update the Q table according to the update rule |
Set |