Require: p(T) : distribution over tasks Require: α, β : step size hyperparameters 1: randomly initialize θ 2: while not done do 3: Sample batch of tasks T i ~ p ( T ) 4: for all do 5: Evaluate θ L T i ( f θ ) with respect to K examples 6: Compute adapted parameters with gradient descent: θ ' i = θ a θ L T i ( f θ ) 7: end for 8: Update θ θ β θ Σ T i ~ p ( T ) L T i ( f θ ' i ) 9: end while