In-context Learning (ICL)¶

1. Definition¶

In-context Learning or ICL was defined in "Language Models are few-shot learners" by Brown et al., the paper that introduced GPT-3. The authors define ICL as:

During unsupervised pre-training, a language model develops a broad set of skills and pattern recognition abilities. It then uses these abilities at inference time to rapidly adapt to or recognize the desired task. We use the term “in-context learning” to describe the inner loop of this process, which occurs within the forward-pass upon each sequence.

2. ICL vs Size of the Model¶

Akyürek et al. make another observation that ICL exhibits algorithmic phase transitions as model depth increases:

One-layer transformers’ ICL behavior approximates a single step of gradient descent, while wider and deeper transformers match ordinary least squares or ridge regression solutions.
It is possible to imagine that if small models implement simple learning algorithms in-context, larger models might implement more sophisticated functions during ICL.
Smaller models do not seem to learn from in-context examples, and larger ones do.

In ICL over Graphs : PRODIGY, I covered the paper "PRODIGY : Enabling in-context learning over graphs" which extends the concept of ICL to graphs.

In-context Learning (ICL)¶

1. Definition¶

2. ICL vs Size of the Model¶

3. Related Discussion¶