You need an AI system that knows your company's data. Should you fine-tune a model or use retrieval-augmented generation (RAG)? Here's a decision framework based on what actually works in production.

Quick definitions

RAG (Retrieval-Augmented Generation): Store your documents in a vector database. At runtime, search for relevant chunks and pass them to a general-purpose LLM in the prompt. The model answers based on the retrieved context.

Fine-tuning: Take a base model (GPT-3.5, Llama, Mistral) and train it further on your specific examples. The model internalizes your data and style.

When to choose RAG

Use cases that fit RAG

Advantages of RAG

Disadvantages of RAG

Typical RAG costs

When to choose fine-tuning

Use cases that fit fine-tuning

Advantages of fine-tuning

Disadvantages of fine-tuning

Typical fine-tuning costs

Comparison matrix

Factor RAG Fine-Tuning
Time to production 2–4 weeks 6–12 weeks
Upfront cost €8k–€25k €15k–€50k
Ongoing cost Higher (long prompts) Lower (short prompts)
Latency Higher (2 steps) Lower (1 step)
Updates Easy (add docs) Hard (retrain)
Citations Yes No
Style control Limited Strong
Best for Knowledge Q&A Format/style tasks

Can you combine them?

Yes. Fine-tune for style and structure, then use RAG to inject current facts. Example: A customer support bot fine-tuned to write concise, empathetic responses, with RAG pulling the latest product documentation.

This gives you style consistency (fine-tuning) and up-to-date information (RAG), but adds complexity and cost.

Decision framework

Start with RAG if:

Choose fine-tuning if:

Use both if:

Real examples

Company A – Legal document Q&A: 10,000 contracts, updated quarterly. Used RAG with GPT-4. Built in 3 weeks, €12k dev cost, €800/month run cost. Citations required for compliance.

Company B – Customer support email generation: 50,000 past tickets, consistent tone required. Fine-tuned GPT-3.5 Turbo. Built in 8 weeks, €30k total cost, €400/month hosting. Responses feel "on-brand."

Company C – Technical support bot: Combined fine-tuning (for structured troubleshooting format) + RAG (for latest KB articles). Built in 10 weeks, €40k dev cost, €1,200/month run cost. Best of both worlds.

Next steps

Identify your goal: Are you answering questions from documents (RAG) or generating content in a specific style (fine-tuning)? Start with RAG unless you have clear evidence that style/format matters more than content freshness.

Build a small proof-of-concept with 10–50 examples. Measure accuracy, latency, and user satisfaction. Expand only after you validate the approach.

Most businesses start with RAG and layer fine-tuning later if needed. It's faster, cheaper, and easier to explain to stakeholders.