Require: p(T) : distribution over tasks Require: α, β : step size hyperparameters 1: randomly initialize θ 2: while not done do 3: Sample batch of tasks $T_{i} ~ p (T)$ 4: for all do 5: Evaluate $\nabla_{θ} L_{T_{i}} (f_{θ})$ with respect to K examples 6: Compute adapted parameters with gradient descent: $θ'_{i} = θ - a \nabla_{θ} L_{T_{i}} (f_{θ})$ 7: end for 8: Update $θ \leftarrow θ - β \nabla_{θ} Σ_{T_{i} ~ p (T)} L_{T_{i}} (f_{θ'_{i}})$ 9: end while