Fine-tuning (deep learning)

In deep learning, fine-tuning is an approach to transfer learning in which the parameters of a pre-trained model are trained on new data.^[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (not updated during the backpropagation step).^[2] A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter–efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.^[3]

For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen because they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.^[2]^[4]

Models that are pre-trained on large and general corpora are usually fine-tuned by reusing the model's parameters as a starting point and adding a task-specific layer trained from scratch.^[5] Fine-tuning the full model is common as well and often yields better results, but it is more computationally expensive.^[6]

Fine-tuning is typically accomplished with supervised learning, but there are also techniques to fine-tune a model using weak supervision.^[7] Fine-tuning can be combined with a reinforcement learning from human feedback-based objective to produce language models like ChatGPT (a fine-tuned version of GPT-3) and Sparrow.^[8]^[9]

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]