Adam Fletcher

Data Scientist
AI

March 11, 2026

LLMs explained: Part 2 – Building smarter applications with large language models

Almost every organisation today is facing the same strategic question: how do we harness generative AI without rebuilding our technology stack from the ground up?

The truth is, you don’t need to build ChatGPT (or any large language model) yourself. The world’s most capable models already exist, trained on vast datasets and available via API or open source. The real opportunity lies in how you extend them, integrate them, and align them with your data, compliance requirements, and operational priorities.

At present, success with LLMs isn’t about developing entirely new models, it’s about designing the right architecture around existing ones.

The key is ensuring AI is integrated in the context of your business. You need to align it with your data, your workflows, and the way your teams operate.

There are three main ways to do this:

  • Retrieval-Augmented Generation (RAG).
  • Fine-tuning.
  • LLM agents or model chaining.

Each strategy offers a different balance of flexibility, accuracy, and complexity, and knowing which option is most appropriate for your business is what will allow us to add real enterprise value.

Retrieval-Augmented Generation (RAG)

What is it?

RAG extends a base model by connecting it to your organisation’s internal knowledge. It works by storing document embeddings (mathematical representations of meaning) in a vector database. When a user submits a query, the system retrieves the most relevant information, feeds it into the prompt, and the LLM produces a grounded, contextual answer.

Essentially, RAG gives an LLM access to your company’s knowledge base in real time, and also allows better validation against databases, therefore reducing hallucinations.

Organisational value

Most enterprise information is scattered across systems, like wikis, PDFs, ticketing tools, CRMs, and intranets. RAG bridges those silos without the need for retraining. It’s also one of the most cost-effective ways to make an LLM truly useful inside an organisation while maintaining data privacy and ownership.

The pros

  • RAG accesses dynamic or private data without retraining.
  • Access to specific data can be restricted, much like in a database.
  • It is faster and cheaper than fine-tuning.
  • Hallucinations are reduced by grounding responses in verified content.
  • Easy to scale as your document store grows.

The cons

  • Retrieval misses can cause inaccurate responses.
  • It can add latency to the inference process.
  • RAG requires thoughtful architecture and maintenance of embeddings, search indexes, and caching.
  • It struggles to provide accurate responses to complex questions that require multiple contexts.

When to use it

RAG is ideal when your knowledge changes frequently or is spread across multiple sources e.g. internal knowledge bases, policy documents, or product information.

It’s a strong foundation for internal chat assistants, customer service tools, and support knowledge automation.

Fine-tuning

What is it?

Fine-tuning retrains an existing model using your organisation’s own data, so it learns to mirror your specific language, tone, and logic.

Rather than prompting a general-purpose model to act as a particular persona, you feed it thousands of your organisation’s own examples until it becomes one.

Organisational value

Fine-tuning enables deeper contextual understanding and more consistent outputs. It’s the best way to achieve domain-specific excellence, especially in regulated or highly technical fields where nuance and precision matter.

The pros

  • High accuracy rates for specialised or repetitive tasks.
  • No retrieval layer creates a simpler runtime architecture.
  • Fine-tuning produces consistent, structured outputs.

The cons

  • Knowledge is static, so once trained, the model doesn’t learn from new data.
  • Can be expensive and time-intensive to maintain.
  • There’s a risk of overfitting if training data lacks diversity.
  • Fine-tuning risks hallucinations that are difficult to validate.

When to use it

Fine-tuning works best when accuracy and compliance outweigh flexibility, for example:

  • Automated report generation.
  • Customer service response drafting.
  • Financial, legal, or policy summarisation.
  • Content review and tagging.

LLM agents and model chaining

What is it?

LLM agents take the concept of combining approaches further, by orchestrating multiple models or “personas” that collaborate to solve complex tasks. Each agent can perform a role, for example, extracting data, validating facts, drafting outputs, or calling APIs.

This is sometimes called model chaining or tool-using AI, where models reason across steps rather than respond to a single prompt.

Organisational value

Agents move LLMs beyond simple question answering, and into multi-step reasoning and execution. They can execute multi-step reasoning, apply decision logic, or even take actions across systems. In many cases, this moves AI from advisory to operational.

The pros

  • Agents can handle multi-stage reasoning and iterative problem-solving.
  • It can combine specialisms, e.g. drafting, reviewing, and validating.
  • Presentation of tasks makes them simpler to validate.
  • This approach enables automation across applications and APIs.

The cons

  • Agents can be slower and more expensive as multiple models often run in sequence.
  • This approach requires robust orchestration and state management.
  • We can still be limited by the base model’s reasoning capabilities.

When to use it

Agents make sense when the process involves several dependent steps or decision points, for example:

  • Document analysis and summarisation workflows.
  • Technical report creation and validation.
  • Complex query routing and information retrieval.
  • Automated business process orchestration.

This approach produces systems that don’t just respond to prompts, but collaborate and act. You could combine this with RAG to enrich your data, however, the expertise and costs required often make this an impractical option.

Optimising models for efficiency, privacy, and performance

The AI landscape is shifting. It’s no longer about building ever-larger models, it’s about making them smarter, faster, and more cost-effective to deploy.

New optimisation techniques now allow organisations to run high-performing language models with fewer resources.

For example, developers can take a large model and compress or adapt it into a smaller one that retains most of its accuracy but costs far less to operate. This is known as ‘model distillation’ and these lighter models can be deployed on-premises, improving privacy and responsiveness. Although this is possible, it’s not a common practice.

At the same time, open-source models such as Qwen, Deepseek and OpenAI GPT OSS are quickly catching up with the performance of commercial systems.

For enterprises, this opens new options around lower infrastructure costs, greater control over data and compliance and the flexibility to combine proprietary and open-source models in a single architecture.

A modern enterprise AI strategy often looks like this:

  • Commercial models (OpenAI, Anthropic) for advanced reasoning.
  • Fine-tuned open-source models for domain-specific use cases.
  • Distilled or quantised models for high-volume, latency-sensitive tasks.
  • This hybrid model architecture offers the best blend of innovation, privacy, and cost optimisation.

Core insights

There’s no one-size-fits-all path to integrating LLMs. The right approach depends on your data maturity, operational priorities, and governance model.

Business objective Best-fit approach
Dynamic or fast-changing data Retrieval-augmented generation
High accuracy and compliance Fine-tuning
Complex multi-step workflows LLM agents / model chaining

 

In essence:

  • RAG gives you access to knowledge.
  • Fine-tuning gives you precision.
  • Agents give you reasoning and action.

The most advanced enterprises are now combining these techniques, creating layered, adaptive AI systems that evolve with their business.

Architect, don’t experiment

Generative AI isn’t just another technology trend. It represents a shift in how modern systems are designed and deployed.
The real differentiator won’t be which model you use, but how well it connects to your data, aligns with your governance, and fits into your existing systems.

Building AI capabilities that are reliable, explainable, and secure will matter far more than raw output quality.

About the author

Adam Fletcher is a Data Scientist and former cancer researcher with extensive experience analysing data and building ML/AI systems that solve real problems. With over 10 years of experience doing hypothesis-driven research ranging from modelling chemotherapy resistance to designing non-invasive prenatal tests for genetic abnormalities. At Equal Experts, he specialises in delivering data science and AI solutions across multiple sectors, including retail, government, and manufacturing. Highly technical and hands-on, he combines research discipline with pragmatic delivery, turning messy data and complex requirements into intelligent products and actionable insight.

Disclaimer

This blog is a record of our experiments and experiences with AI. It reflects what we tried, learned, and observed, but does not represent Equal Experts’ official practices or methodologies. The approaches described here may not suit every context, and your results may vary depending on your goals, data, and circumstances.

You may also like

Part 1: How LLMs actually work, and what that means for your business

Blog

LLMs explained: Part 1 – How LLMs actually work, and what that means for your business

Blog

Coding with LLMs: are we re-inventing linguistics with prompts?

Blog

Experimenting with AI in delivery

Get in touch

Solving a complex business problem? You need experts by your side.

All business models have their pros and cons. But, when you consider the type of problems we help our clients to solve at Equal Experts, it’s worth thinking about the level of experience and the best consultancy approach to solve them.

 

If you’d like to find out more about working with us – get in touch. We’d love to hear from you.