Insights

RAG vs Graph vs Hybrid for Industrial Knowledge

Author: Mandelbulb Technologies

7 MIN READ · August 22, 2025

Field notes from three Plant Brain deployments on when vector RAG wins, when graph retrieval wins, when the time-series layer wins, and when you have to combine all three. The retrieval pattern, not the model, is the question that takes longest to answer.

We've shipped three Plant Brain deployments at different points on the manufacturing spectrum — a Tier-2 auto supplier in Pune, a process plant in the Middle East, and a specialty chemicals operator in Europe. In all three, the question that took longest to answer was not which model. It was which retrieval pattern.

Vector RAG is the default answer most teams reach for, because it's the one everyone has built before. It is wrong roughly half the time in industrial settings, and the cost of being wrong is a plant operator who stops trusting the agent.

Here is what we've learned about when each pattern wins.

The four shapes of industrial knowledge

Before you can pick retrieval, you have to know what you're retrieving. In every plant we've worked in, the knowledge comes in four shapes:

  1. Free-text documents — SOPs, operator handbooks, vendor manuals, training material. Long-form, written by humans for humans.

  2. Structured engineering artifacts — P&IDs, equipment hierarchies, BOMs, asset registers, control narratives. Graph-shaped at heart.

  3. Time-series and event logs — downtime tickets, SCADA alarms, quality holds, batch records. Tabular and chronological.

  4. Cross-domain artifacts — a downtime ticket that references a P&ID node and quotes an SOP step. Most operator questions in production land here.

The first three each want a different retrieval pattern. The fourth requires you to combine them. There is no single right answer.

When vector RAG wins

Vector RAG is the right tool for shape 1. SOPs, operator handbooks, training documents — these are dense, free-text, semantically similar in known ways. A well-tuned embedder plus a reranker handles 90% of operator questions of the form "how do I…" or "what is the procedure for…"

Our default stack for this layer:

  • Embedder: a domain-tuned bge-large variant, fine-tuned on plant SOPs.

  • Vector store: pgvector for the operator-facing surface, because the rest of the operational data already lives in Postgres and one fewer system is one fewer outage.

  • Reranker: a cross-encoder running on the same on-prem box as the LLM.

Latency at p95 sits around 380 ms. Accuracy on operator-question evals sits at 91%.

This is the table-stakes layer. If your industrial AI doesn't do at least this well, nothing else matters.

When graph retrieval wins

Vector RAG is wrong for shape 2.

Ask an operator: "If pump P-204 fails, what tanks am I going to lose, and which batches are in flight?"

The answer is in the P&ID. It is a graph traversal — pump P-204 feeds tanks T-301 and T-303; T-301 currently holds batch B-1947; B-1947 is on the schedule for the 16:00 customer ship. There is no embedding distance that gets you to that answer. The relationships are explicit, deterministic, and structured.

Trying to do this in vector space is the single most common failure mode we see in industrial RAG deployments. The agent retrieves the SOP about pump failures and writes a generic answer that names no tanks, no batches, and no customers.

Our graph layer:

  • Store: Neo4j on-prem, populated from the asset register, P&ID exports, and the MES.

  • Query layer: an LLM-generated Cypher with strict schema constraints and a fallback to canned templates for the 40 most common operator questions.

  • Reconciliation: a nightly job that compares the live SCADA tag list against the graph and flags drift.

This took us longer to build than any other component, because it is the layer where the engineering team has to talk to the operations team and agree on truth. Worth every hour. The questions an operator asks of a working graph layer are the questions they would never have bothered to ask a dashboard.

When the time-series layer wins

Shape 3 — downtime tickets, alarms, quality holds — is tabular. The right retrieval is SQL or KQL against the warehouse. We use Microsoft Fabric's Eventhouse for this in customers that have committed to the Fabric stack, and a Timescale-on-Postgres pattern for the on-prem deployments.

The agent calls this via a typed tool, not via natural language. We learned this the hard way: text-to-SQL on industrial schemas is a coin flip, and the cost of a wrong answer about downtime is an operator who calls the wrong supervisor. Type the tool, constrain the query shape, validate the output. Boring engineering. Right answer.

When you need hybrid

Most real questions are shape 4. The operator asks: "Why did batch B-1947 fail QA, what did we do last time something like this happened, and what does the SOP say to do now?"

That question wants:

  • The time-series layer to find batch B-1947 and the QA result.

  • The graph layer to identify the equipment involved and the upstream batches.

  • Vector RAG to find the relevant SOP and the postmortem from the previous similar incident.

  • A frontier LLM to write the answer.

The agent orchestrates this as a deterministic plan, not a free-running ReAct loop. We tried the ReAct version. It worked beautifully on demos and broke beautifully in production, because LLMs hallucinate tool calls when they're tired.

Our orchestration spine is LangGraph for the deterministic plan, with CrewAI subagents for the parts that genuinely need multi-step reasoning. That hybrid pattern is its own field note (coming).

A picture of the stack

Operator question
        │
        ▼
   Intent classifier (Construction-style SLM, fine-tuned per plant)
        │
        ├──► Shape 1 (procedural) ──► Vector RAG ──► LLM
        │
        ├──► Shape 2 (structural) ──► Graph (Cypher) ──► LLM
        │
        ├──► Shape 3 (operational) ──► Typed SQL tool ──► LLM
        │
        └──► Shape 4 (hybrid) ──► Deterministic plan
                                    │
                                    ├──► Time-series
                                    ├──► Graph
                                    └──► Vector RAG ──► LLM

What we'd build differently next time

Three things, in order of regret.

Build the graph layer first. We didn't. We built vector RAG first because it was familiar, and the agent looked smart on day one and useless by week three because it couldn't answer the structural questions that mattered. If we were starting again, the graph and the SQL tooling come before the vector store.

Make the embedder domain-tuned from the start. A generic embedder on industrial text loses 8-12 points of retrieval accuracy versus a domain-tuned one. Fine-tuning the embedder is two weeks of work. We deferred it twice and paid for it twice.

Treat the orchestration plan as a product surface. Operators want to see what the agent is going to do before it does it, particularly for shape-4 questions. The plan UI is now standard in our deployments and is the feature operators mention most often when they describe what they like about the system.

The takeaway

Don't pick a retrieval religion. Pick the right tool per data shape and orchestrate them as one system. Industrial knowledge isn't free-text in disguise — it's free-text plus graph plus tables, and the systems that ignore that are the ones that don't make it past the pilot.

The Plant Brain is built this way. If you are wrestling with the same architecture problem, reach out.

Newsletter

Get the next essay in your inbox

Monthly insights on enterprise AI, product updates, and field notes from our deployments.

Put this thinking to work on your own operation. Run the free 2 minute AI Opportunity Scan.