Algorithm 1. Dynamic programming algorithm

Require: L , T , A , P p i c k u p , P d e s t , d s e e k , d d r i v e r , t s e e k , t d r i v e r , D y , p k

Ensure: The optimal policy π

1: p k is a | L | × 1 , V is a | L | × | T | matrix; p k 0 , V 0

2: for |L| to 1 do

3: p L p k

4: end for

5: for t = T − 1 to 1 do

6: for l = 1 to |L| do

7: for d in len(d) do

8: for a A do

9: a max find a max V ( s , a ) can be computed by Equation (4)

10: π a max

11: V ( s ) max V ( s , a max )

12: end for

13: end for

14: end for

15: end for

16: Return π