Uniform Sampling over Episode Difficulty

Home / Papers / Projects / Presentations / Posters

\( % Universal Mathematics \newcommand{\paren}[1]{\left( #1 \right)} \newcommand{\brackets}[1]{\left[ #1 \right]} \newcommand{\braces}[1]{\left\{ #1 \right\}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\case}[1]{\begin{cases} #1 \end{cases}} \newcommand{\bigO}[1]{\mathcal{O}\left(#1\right)} % Analysis % Linear Algebra \newcommand{\mat}[1]{\begin{pmatrix}#1\end{pmatrix}} \newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}} % Probability Theory \DeclareMathOperator*{\V}{\mathop{\mathrm{Var}}} \DeclareMathOperator*{\E}{\mathop{\mathbb{E}}} \newcommand{\Exp}[2][]{\E_{#1}\brackets{#2}} \newcommand{\Var}[2][]{\V_{#1}\brackets{#2}} \newcommand{\Cov}[2][]{\mathop{\mathrm{Cov}}_{#1}\brackets{#2}} % Optimization \newcommand{\minimize}{\operatorname*{minimize}} \newcommand{\maximize}{\operatorname*{maximize}} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator*{\argmax}{arg\,max} % Set Theory \newcommand{\C}{\mathbb{C}} \newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}} \newcommand{\Z}{\mathbb{Z}} \)

Can we improve transfer accuracy by changing how we sample few-shot classification episodes? Yes, here’s how.

Summary

Episodic training is a core ingredient of few-shot learning to train models on tasks with limited labelled data. Despite its success, episodic training remains largely understudied, prompting us to ask the question: what is the best way to sample episodes? In this paper, we first propose a method to approximate episode sampling distributions based on their difficulty. Building on this method, we perform an extensive analysis and find that sampling uniformly over episode difficulty outperforms other sampling schemes, including curriculum and easy-/hard-mining. As the proposed sampling method is algorithm agnostic, we can leverage these insights to improve few-shot learning accuracies across many episodic training algorithms. We demonstrate the efficacy of our method across popular few-shot learning datasets, algorithms, network architectures, and protocols.


Pseudocode for episodic training with importance sampling.

Code

# TODO: Swap with actual implementation
sampler.reset()
for episode in batch:
  loss = compute_loss(episode)
  is_weight = sampler.weight(loss)
  sampler.update(loss)
  (is_weight * loss).backward()
model.parameters /= sampler.ess()


Reference

Please cite this work as

S. M. R. Arnold, G. S. Dhillon, A. Ravichandran, S. Soatto, Uniform Sampling over Episode Difficulty. NeurIPS 2021.

or with the following BibTex entry.

@inproceedings{arnold2021uniform,
 author = {Arnold, S\'{e}bastien M. R. and Dhillon, Guneet S. and Ravichandran, Avinash and Soatto, Stefano},
 title = {Uniform Sampling over Episode Difficulty},
 booktitle = {Advances in Neural Information Processing Systems},
 volume = {34},
 year = {2021}
}


Contact
Séb Arnold - seb.arnold@usc.edu

About

How to sample episodes to improve few-shot classification accuracy.

Venue

NeurIPS 2021

People
Resources