# Understanding Why MAML Can Adapt Fast

When MAML Can Adapt Fast and How to Assist When It Cannot

Loss surface and trajectories of MAML-trained deep linear models.

Abstract

Illustrative Results

As a small teaser, the following figure illustrates a simple failure-mode of MAML: although the black and red models have the same modelling capacity, only the overparameterized one is able to adapt quickly.

Why? Please refer to our paper for detailed explanations, including theoretical results and empirical analysis on Omniglot, CIFAR-FS, and mini-ImageNet.

We also introduce several methods that improve the adaptability of gradient-based meta-learning algorithms such as MAML and ANIL; one is to add additional linear layers, the other to learn an external meta-optimizer.

MAML fails to adapt quickly with shallow linear models (Shallow, LogR) on simple convex tasks -- here, linear regression and binary classification. Overparameterizing the models with extra parameters (Deep, LogR+LinNet) lets MAML meta-learn optimization weights that enable fast-adaptation.

Code

The implementation of our meta-optimizer, Meta-KFO, as well as all baselines are available in learn2learn. We also release example implementations of Meta-KFO, available at:

http://github.com/Sha-Lab/kfo

import torch
import learn2learn as l2l

model = MyModel()
metaopt = l2l.optim.KroneckerTransform(l2l.nn.KroneckerLinear)
gbml = l2l.algorithms.GBML(
module=model,
transform=metaopt,
lr=0.01,
)
opt = torch.optim.SGD(gbml.parameters(), lr=0.001)

for iteration in range(10):
loss.backward()
opt.step()
Example of using the GBML wrapper with Kronecker-Factored Optimizers (KFO).

Reference

S. M. R. Arnold, S. Iqbal, F. Sha, When MAML Can Adapt Fast and How to Assist When It Cannot. AISTATS 2021.

or using the following BibTex entry.

