AI Knowledge/Agent Architecture

AI Knowledge — 02

Agent Architecture: How It Actually Works

An AI Agent is not just an LLM with a chat interface. It is a system of components: a backend that orchestrates flow, a RAG layer that retrieves context, tools that take real actions, and an LLM that decides what to do next. Here is how they connect.

Full Architecture

User

UI

Frontend

Backend / Server — Flow Orchestrator

RAG

Context retrieval

Tool Executor

Runs actual tools

Flows

Purpose-specific orchestration logic

sends prompt
loop until done
tool call / answer

LLM API

Decides: answer or use tool

Each Component's Role

Backend — the orchestrator

This is where your application logic lives. It receives the user query, decides which flow to run, calls RAG, assembles the prompt, and manages the loop with the LLM.

The LLM does not orchestrate. The backend does. The LLM only decides what to do next within each turn.

RAG — context retrieval

Before the LLM sees the question, the backend searches a vector database for relevant documents or records. That retrieved context is added to the prompt so the LLM answers from real data, not from training memory.

RAG always runs before the LLM call, not after.

Tool Executor — actions

Tools are functions the backend can run: query a database, call an API, write a file, send a notification. The LLM receives a list of tool definitions (names and descriptions) and decides which one to call.

The LLM outputs a tool call request. The backend executes it. The LLM does not run tools directly.

LLM — the decision engine

Given the user message, retrieved context, and available tools, the LLM does one of two things: produce a final answer, or request a tool call. That is its entire job in the loop.

The Loop — what makes it an Agent

A regular chatbot calls the LLM once and returns the response. An Agent can call the LLM multiple times within a single user request until it has gathered everything it needs.

1User

"Lot A2241 failed tensile test. What do we do?"

2Backend

RAG search → retrieves traceability records + past NCRs

3LLM

Decides: use MES tool to check in-process lots

4Backend

Executes MES tool → returns 8 lots on the line

5LLM

Decides: use Regulatory DB to check notification rules

6Backend

Executes Regulatory DB tool → AS9100 §8.7, 72h notice required

7LLM

Has enough data. Generates final answer.

8User

Receives: impact summary + actions + draft documents

Steps 3–6 are the loop. The LLM decided to use two tools before it had enough information to answer. A simple chatbot would have guessed after step 2. The Agent waited until it had real data.

Flows — purpose-specific logic

Not every request should follow the same path. A Flow defines which tools are available, in what order steps run, and how results are handled — for a specific type of task.

nonconformance_flow

RAG: traceability lookup
Tool: MES hold
Tool: Regulatory check
Tool: Report generator
LLM: summarize + recommend

supply_query_flow

RAG: supplier history
Tool: Inventory check
Tool: Lead time lookup
LLM: summarize options

general_qa_flow

RAG: knowledge base search
LLM: answer from context

The backend routes each incoming request to the appropriate flow based on intent classification — often done by the LLM itself in the first step.

The key distinction

General AI (chatbot)
AI Agent
LLM generates response from training data
LLM uses RAG to answer from your actual data
One LLM call per user message
Multiple LLM calls within one request (the loop)
LLM cannot take actions
Backend executes real tools on LLM's instruction
No concept of flow or sequence
Backend routes requests through purpose-specific flows