Skip to content

In-context Learning (ICL)

1. Definition

In-context Learning or ICL was defined in "Language Models are few-shot learners" by Brown et al., the paper that introduced GPT-3. The authors define ICL as:

During unsupervised pre-training, a language model develops a broad set of skills and pattern recognition abilities. It then uses these abilities at inference time to rapidly adapt to or recognize the desired task. We use the term “in-context learning” to describe the inner loop of this process, which occurs within the forward-pass upon each sequence.

2. ICL vs Size of the Model

Akyürek et al. make another observation that ICL exhibits algorithmic phase transitions as model depth increases:

  • One-layer transformers’ ICL behavior approximates a single step of gradient descent, while wider and deeper transformers match ordinary least squares or ridge regression solutions.
  • It is possible to imagine that if small models implement simple learning algorithms in-context, larger models might implement more sophisticated functions during ICL.
  • Smaller models do not seem to learn from in-context examples, and larger ones do.
  1. In ICL over Graphs : PRODIGY, I covered the paper "PRODIGY : Enabling in-context learning over graphs" which extends the concept of ICL to graphs.