LLM Decoding: Inference in Autoregressive Language Models
Most large language models (LLMs) today are autoregressive models. Before LLMs, NLP was fragmented — different problems like text classification, translation, summarization, and question answering all needed their own models, datasets, and training tricks. But then came GPT-2, and everything changed. GPT-2 is an autoregressive model trained purely on text generation — predicting the next word in a sequence — that’s called decoding.Surprisingly, this simple setup made it capable of handling a wide range of NLP tasks, often without fine-tuning....