Fine-tuning (machine learning)

From HandWiki
Short description: Machine learning technique

In machine learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data.[1] Fine-tuning can be done on the entire neural network, or on only a subset of its layers, in which case the layers that are not being fine-tuned are "frozen" (not updated during the backpropagation step).[2] A model may also be augmented with "adapters" that consist of far fewer parameters than the original model, and fine-tuned in a parameter-efficient way by tuning the weights of the adapters and leaving the rest of the model's weights frozen.[3]

For some architectures, such as convolutional neural networks, it is common to keep the earlier layers (those closest to the input layer) frozen because they capture lower-level features, while later layers often discern high-level features that can be more related to the task that the model is trained on.[2][4]

Models that are pre-trained on large and general corpora are usually fine-tuned by reusing the model's parameters as a starting point and adding a task-specific layer trained from scratch.[5] Fine-tuning the full model is common as well and often yields better results, but it is more computationally expensive.[6]

Fine-tuning is typically accomplished with supervised learning, but there are also techniques to fine-tune a model using weak supervision.[7] Fine-tuning can be combined with a reinforcement learning from human feedback-based objective to produce language models like ChatGPT (a fine-tuned version of GPT-3) and Sparrow.[8][9]

Techniques

Low-rank adaption

Low-rank adaption (LoRA) is an adapter-based technique for efficiently finetuning models. The basic idea is to design a low-rank matrix that is then added to the original matrix.[10]

Adapter-based fine-tuning allows for performance approaching that of full-model fine-tuning while drastically shrinking the number of modified weights that must be saved to disk. A language model with billions of parameters may be LoRA fine-tuned with only several millions of parameters to save, or even fewer.

LoRA-based fine-tuning has become popular in the Stable Diffusion community.[11] Support for LoRA is being integrated into the Diffusers library from Hugging Face.[12] Support for LoRA and similar techniques is also available for a wide range of other models through Hugging Face's Parameter-Efficient Fine-Tuning (PEFT) package.[13]

Quantization

Standard techniques for fine-tuning large language model require large quantities of GPU memory. For instance, LLaMA, a popular base model for fine-tuning in 2023, is unable to fit onto even the largest consumer GPU. Memory constraints have motivated research into quantization of the neural networks underlying language models, such as the QLoRA method, which quantizes a language model's weights into a 4-bit floating point representation, enabling a 65B-parameter LLaMA model to be fine-tuned on a single GPU with 48 GB of GPU memory.[14]

Robustness

Fine-tuning can degrade a model's robustness to distribution shifts.[15][16] One mitigation is to linearly interpolate a fine-tuned model's weights with the weights of the original model, which can greatly increase out-of-distribution performance while largely retaining the in-distribution performance of the fine-tuned model.[17]

Applications

Natural language processing

Fine-tuning is common in natural language processing (NLP), especially in the domain of language modeling. Large language models like OpenAI's GPT-2 can be fine-tuned on downstream[jargon] NLP tasks to improve performance over that of the unmodified pre-trained model.[6]

Computer vision

OpenAI's CLIP model has been used as a base model for fine-tuning for multiple downstream applications.[18][19]

Support

Open-source models (models that have their weights available) can be fine-tuned using similar code as is used for pre-training.

Commercially-offered language models may also be fine-tuned if the provider offers a fine-tuning API. As of June 19, 2023, language model fine-tuning APIs are offered by OpenAI and Microsoft Azure's Azure OpenAI Service for a subset of their models, as well as by Google Cloud Platform for some of their PaLM models, and by others.[20][21][22] Not all commercial models support fine-tuning; notably, OpenAI's GPT-3.5 and GPT-4 do not yet support fine-tuning by developers.

See also

References

  1. Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. https://d2l.ai/chapter_computer-vision/fine-tuning.html#steps. Retrieved January 10, 2023. 
  2. 2.0 2.1 "CS231n Convolutional Neural Networks for Visual Recognition". https://cs231n.github.io/transfer-learning/. 
  3. Liu, Haokun; Tam, Derek; Muqeeth, Mohammed; Mohta, Jay; Huang, Tenghao; Bansal, Mohit; Raffel, Colin A (2022). "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning". in Koyejo, S.; Mohamed, S.; Agarwal, A. et al.. Advances in Neural Information Processing Systems. 35. Curran Associates, Inc.. pp. 1950–1965. https://proceedings.neurips.cc/paper_files/paper/2022/file/0cde695b83bd186c1fd456302888454c-Paper-Conference.pdf. 
  4. Zeiler, Matthew D; Fergus, Rob (2013). Visualizing and Understanding Convolutional Networks. 
  5. Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping. 
  6. 6.0 6.1 Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems. 
  7. Yu, Yue; Zuo, Simiao; Jiang, Haoming; Ren, Wendi; Zhao, Tuo; Zhang, Chao (2020). Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach. 
  8. "Introducing ChatGPT". https://openai.com/blog/chatgpt. 
  9. Glaese, Amelia; McAleese, Nat; Trębacz, Maja; Aslanides, John; Firoiu, Vlad; Ewalds, Timo; Rauh, Maribeth; Weidinger, Laura et al. (2022). Improving alignment of dialogue agents via targeted human judgements. 
  10. Hu, Edward J.; Shen, Yelong; Wallis, Phillip; Allen-Zhu, Zeyuan; Li, Yuanzhi; Wang, Shean; Wang, Lu; Chen, Weizhu (2022-01-28) (in en). LoRA: Low-Rank Adaptation of Large Language Models. https://openreview.net/forum?id=nZeVKeeFYf9. 
  11. Ryu, Simo (February 13, 2023). "Using Low-rank adaptation to quickly fine-tune diffusion models". https://github.com/cloneofsimo/lora. 
  12. Cuenca, Pedro; Paul, Sayak (January 26, 2023). "Using LoRA for Efficient Stable Diffusion Fine-Tuning". https://huggingface.co/blog/lora. 
  13. "Parameter-Efficient Fine-Tuning using 🤗 PEFT". https://huggingface.co/blog/peft. 
  14. Dettmers, Tim; Pagnoni, Artidoro; Holtzman, Ari; Zettlemoyer, Luke (2023). "QLoRA: Efficient Finetuning of Quantized LLMs". arXiv:2305.14314 [cs.LG].
  15. Radford, Alec; Kim, Jong Wook; Hallacy, Chris; Ramesh, Aditya; Goh, Gabriel; Agarwal, Sandhini; Sastry, Girish; Askell, Amanda; Mishkin, Pamela; Clark, Jack; Krueger, Gretchen; Sutskever, Ilya (2021). "Learning Transferable Visual Models From Natural Language Supervision". arXiv:2103.00020 [cs.CV].
  16. Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution. 
  17. Wortsman, Mitchell; Ilharco, Gabriel; Kim, Jong Wook; Li, Mike; Kornblith, Simon; Roelofs, Rebecca; Gontijo-Lopes, Raphael; Hajishirzi, Hannaneh; Farhadi, Ali; Namkoong, Hongseok; Schmidt, Ludwig (2022). "Robust fine-tuning of zero-shot models". arXiv:2109.01903.
  18. Baldrati, Alberto; Bertini, Marco; Uricchio, Tiberio; Del Bimbo, Alberto (2022). "Conditioned and Composed Image Retrieval Combining and Partially Fine-Tuning CLIP-Based Features". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 4959-4968. 
  19. Rasheed, Hanoona; Khattak, Muhammad Uzair; Maaz, Muhammad; Khan, Salman; Khan, Fahad Shahbaz (2023). "Fine-Tuned CLIP Models Are Efficient Video Learners". Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 6545–6554. 
  20. "Fine-tuning". OpenAI. https://platform.openai.com/docs/guides/fine-tuning. 
  21. "Learn how to customize a model for your application". Microsoft. https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/fine-tuning. 
  22. "Tune text foundation models". Google. https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models.