11 Fine-tuning

Base LLMs, especially the largest ones, are often all you need. They have proven to be remarkably effective just with well-crafted prompts. However, there are instances where relying solely on base models, no matter how strong, may not suffice. Sometimes, you actually need to train your own chatbot.

Two examples are when you want to incorporate private knowledge or customize a model’s behavior to align with specific industry requirements. If the task you need to solve is very specific or the requirements are very strict, it may be too hard to tame a large base model purely using prompt engineering.

In these cases, you can fine-tune an existing model, effectively enhancing its performance for specialized tasks and ensuring it meets your unique needs. This can range from a full training batch to using very efficient and laser-focused tweaks in specific parts of a model.

This chapter will explore the most important fine-tuning techniques available today, from traditional methods to more parameter-efficient approaches. Additionally, I will give you some practical tips to help you get the most out of your fine-tuning efforts.

Why Fine-Tuning?

In short, fine-tuning allows you to adapt pre-trained models to domains, tasks, or general requirements for which they were not designed initially with minimum effort. This is achieved with additional training, but some clever techniques are used to make it as efficient as possible. Let’s look at some examples of cases where you may want to fine-tune a model.

Solving a Novel Task or Domain

As the landscape of machine learning evolves, new tasks frequently emerge that require specialized understanding and capabilities. Fine-tuning enables existing models to be tailored to these novel tasks without starting from scratch. For instance, a general-purpose language model can be fine-tuned for specialized applications such as legal document analysis, medical diagnosis, or customer sentiment analysis.

Training a model from the ground up is often resource-intensive, requiring substantial computational power, time, and large amounts of labeled data. Fine-tuning offers a more efficient alternative by building upon the knowledge already embedded in pre-trained models.

Since these models have already learned general language patterns and structures during their initial training phase, fine-tuning can often focus on optimizing only a fraction of the parameters relevant to the new task.

Reusing Large Models

The ability to reuse pre-trained models across different downstream tasks is one of the most significant advantages of fine-tuning. Instead of developing separate models for each task, organizations can maintain a single base model that serves multiple purposes through fine-tuning.

That is, you can have a large model trained on a large, general purpose corpora, and a bunch of small “adapter” models that you can plugin to steer the big model as necessary.

For example, a base LLM trained on general text can be fine-tuned for tasks such as document classification or question-answering in the same domain by adding small adapters for each subtask. This means instead of dozens of big models—one for each task—you can have one even bigger model with dozens of small pluggable parts, maximizing the utility of existing resources and reducing redundancy. This not only streamlines development processes but also fosters consistency across applications.

Incorporating User Preferences

User preferences and requirements can evolve over time, requiring adjustments to how models respond or behave. Fine-tuning provides a mechanism to incorporate these changes incrementally without retraining from scratch. By training on new datasets that reflect updated user preferences or feedback, organizations can enhance user satisfaction and engagement with their applications.

For instance, a conversational AI can be fine-tuned to align more closely with specific customer service protocols or corporate idioms or styles. This responsiveness not only improves the model’s relevance but also fosters trust and loyalty among users.

Reducing Costs

In many cases, organizations may seek to reduce costs associated with deploying large models due to infrastructure limitations or budget constraints. Fine-tuning allows you to adapt smaller models to achieve performance levels comparable to larger base models.

By hyper-focusing on specific tasks—and forgetting general-purpose linguistic capacities you may not need, such as the ability to answer Wikipedia-like trivia questions—smaller models can be optimized to deliver high-quality outputs without incurring the expenses associated with larger counterparts. This approach enables organizations to maintain competitive performance while minimizing operational costs.

Flavors of Fine-Tuning

Fine-tuning techniques can be categorized based on their computational cost and efficiency. Here’s an overview of the main fine-tuning strategies, sorted from the most costly to the most efficient.

1. Full Parameter Fine-Tuning

In full parameter fine-tuning, all parameters of the pre-trained model are updated during the training process. This approach allows for maximum flexibility and potential performance improvement on specific tasks, as the model can fully adapt to new data. However, it is computationally expensive and requires significant memory resources, making it impractical for very large models or when working with limited hardware.

2. Partial Fine-Tuning

Partial fine-tuning involves updating only a subset of the model’s parameters while keeping others frozen. Typically, this method focuses on the upper layers of the model, which are more task-specific, while lower layers remain unchanged. This approach strikes a balance between performance and resource efficiency, allowing for faster training times and lower memory requirements compared to full fine-tuning.

3. Parameter-Efficient Fine-Tuning

Parameter-efficient fine-tuning techniques, such as LoRA (Low-Rank Adaptation) and adapters, involve modifying only a small number of additional parameters while freezing most of the pre-trained model’s weights. This drastically reduces computational costs and storage requirements. PEFT methods maintain comparable performance to full fine-tuning but are much more efficient, making them suitable for low-resource environments and enabling easier deployment across multiple tasks.

4. Prompt & Prefix Tuning

Prompt tuning involves adding trainable prompt embeddings to the input data rather than modifying model parameters directly. This technique allows models to adapt to new tasks by optimizing these prompts while keeping the rest of the model frozen. It is a lightweight approach that requires significantly fewer resources than traditional fine-tuning methods.

Similarly, prefix tuning adds trainable tensors to each transformer block in a model. These tensors act as context that guides the model’s output without altering its core parameters. Prefix tuning is efficient and effective for certain applications but may not achieve the same level of performance as more comprehensive fine-tuning methods.

Tips for Effective Fine-Tuning

Fine-tuning can significantly enhance the performance of language models, but to maximize its effectiveness, practitioners should consider several key strategies. As usual, it is critical not to miss the forest for the trees and not fall prey to early optimization. Here are essential tips for effective fine-tuning.

1. Exhaust Prompt Engineering

This is the most important advice and the most critical mistake I see small organizations making every single time. Before even thinking about fine-tuning, ensure you have thoroughly explored prompt engineering.

If a state-of-the-art model like GPT-4 cannot solve your task with a well-structured prompt and perhaps some few-shot examples, fine-tuning a smaller model is unlikely to yield better results. Effective prompt engineering can often resolve issues without requiring extensive fine-tuning, which is always more costly.

2. Quality Trumps Quantity

If you have already decided fine-tuning is the way to go, prioritize data quality. Focus on gathering high-quality, diverse examples accurately representing the new task or domain. However much data you can gather will be minuscule compared to the base training set, so quality is the only thing you can control.

Keep in mind that while you can leverage large models with clever prompt engineering to synthesize additional training data, you must always validate these examples with human experts to ensure their relevance and correctness. One great novel example beats 100 excellent but similar ones.

3. Start Small, Grow As Needed

Begin your fine-tuning efforts with parameter-efficient techniques, such as adapters or Low-Rank Adaptation (LoRA). These methods require fewer resources and are often easier to implement than full parameter tuning.

You’ll find tons of open-source implementations of efficient fine-tuning methods, so don’t let the prospect of technical difficulty scare you. Likewise, several LLM providers will let you fine-tune comercial or open source models on their infrastructure, effectively bypassing all need for self-hosting models. This way you get the best of both worlds: a model just for you that someone else takes care of.

If and only if parameter-efficient fine-tuning does not meet your needs, then consider transitioning to full parameter tuning as a subsequent step, but unless you’re swimming in private data—and I mean, terabytes of data—you’re most likely safely on the efficient fine-tuning side.

4. Stay in the Loop

Finally, remember the field of AI is evolving extremely fast, with new models and techniques appearing basically every single week. A task that may require fine-tuning today could potentially be solved through effective prompting in next-generation models released tomorrow.

Therefore, avoid over-investing in ad-hoc fine-tuning pipelines that may become obsolete in a couple of months. Also, assess the capabilities of newer models regularly and be prepared to pivot away from fine-tuned proprietary models as soon as a good prompt shows marginally better results. Prompts are far more portable than models.

Conclusion

Fine-tuning is an excellent strategy for small organizations tackling domain-specific tasks effectively. As AI continues to evolve, the ability to adapt large pre-trained models to meet unique business needs is no longer a luxury but a necessity. For smaller organizations and individual developers, fine-tuning offers a pathway to harness the power of advanced language models without the prohibitive costs associated with training from scratch.

Also, leveraging fine-tuning allows small organizations to capitalize on their private data, tailoring models to reflect their specific industry requirements and user preferences. The moat is no longer having the bigger model since some of the best models are open source. The moat is your private data, and fine-tuning lets you get the best combo: a world-class base model trained on privileged data none of your competitors have.

Moreover, in a competitive landscape where larger organizations like Microsoft, Google, and Meta rely solely on base models, fine-tuning gives small businesses an advantage in niche domains. By optimizing existing models with proprietary data, they can differentiate themselves through improved performance and specialized capabilities that address the specific market needs.