How Eximpedia Reconciled Half a Billion Supplier Records at 99% Accuracy

Time to Prototype2 Weeks

500M+

Supplier records reconciled at 99% accuracy

90% faster

Data processing on Microsoft Fabric

<1s

Query latency across billions of records

By the time the Department brought us in, the data was already there. The time wasn't.

By the time SeAir Exim brought us in, the data was already there. The accuracy wasn't. The company operates Eximpedia, an export-import trade intelligence platform serving thousands of customers globally — traders, customs brokers, supply-chain analysts — across more than one hundred countries. The corpus was already enormous: billions of supplier records, with over half a billion of them requiring active standardisation. The legacy stack ran on ElasticSearch over AWS, and at that scale the same supplier showed up under different names, missing fields, and conflicting addresses across customs declarations from different origin countries. For a customer trying to read who really shipped what, that inconsistency is the problem. The team had built the data moat — billions of records is the moat. What they didn't have was a reconciliation surface that could turn the moat into a trustworthy answer, or a query layer that could return that answer in human time.

What we built: three agents, one decision surface.

The new Eximpedia data platform is built end-to-end on Microsoft Fabric, migrated from the legacy AWS ElasticSearch stack. Trade records flow into Fabric's OneLake; Azure OpenAI handles the reconciliation layer, applying entity resolution against the supplier graph to collapse duplicate records, fill in missing fields from corroborating declarations, and resolve conflicts where the same supplier appears under different names across countries. Power BI and a natural-language query surface sit on top, so a customer can ask the platform a question in plain language and get an answer against billions of records in under a second. GitHub Copilot accelerated the developer build itself — relevant because the migration involved rewriting a substantial slice of the query and ingestion paths inside the engagement window. Underneath, three coordinated stages run against every batch of incoming trade data. The ingestion stage parses raw customs and shipment payloads from each origin country and lands them into OneLake. The reconciliation stage runs Azure OpenAI against the half-billion-plus records requiring standardisation, applying entity resolution, address normalisation, and supplier-graph alignment with confidence scores attached. The serving stage exposes the reconciled corpus through both Power BI semantic models and a natural-language query interface, so customer-facing queries — "show me every shipment of this HS code from these three ports in the last quarter, grouped by importer" — return in under a second. Crucially, we proved the approach before we scaled it. The initial proof-of-concept took 500,000 unique supplier records from a single country and a single trade type and reconciled them at 99% accuracy in two weeks. That number — 500,000 at 99% in two weeks — is what bought permission to apply the same architecture across more than one hundred countries, and ultimately against the full 500 million records that needed standardising. The platform now operates at 90% improvement in data processing speed against the legacy ElasticSearch stack, and Microsoft features the build in its AI First Movers programme as a Fabric + Azure OpenAI reference deployment.

What changed: measured outcomes, recorded against the headwind.

99% reconciliation accuracy across 500 million-plus supplier records — verified at proof-of-concept on 500,000 records in a single country in two weeks, then scaled to 100+ countries.
90% improvement in data processing speed against the legacy ElasticSearch stack — the figure Microsoft cites in its AI First Movers programme.
Sub-second query response across billions of supplier records, replacing a query path where complex joins ran for minutes against the legacy stack.
Migration from AWS ElasticSearch to Microsoft Fabric end-to-end, with no observable disruption to the existing Eximpedia customer surface.
Featured publicly by Microsoft as an AI First Movers reference deployment for Fabric + Azure OpenAI in the trade-data category.
A natural-language query surface on top of the reconciled corpus — customers can ask plain-English questions against billions of records and get answers in under a second.

What we'd do differently. Honestly, two things.

Two things, honestly. First, the proof-of-concept worked too well, too fast. Reconciling 500,000 records at 99% in two weeks was the right shape of demonstration, but it set an internal expectation that the full 100-country, 500-million-record scale-out would run at the same per-record cost — and it didn't. Cross-country reconciliation has tail-distribution problems that single-country reconciliation doesn't surface: a supplier in Vietnam shipping under a Romanised name to a buyer in Brazil who declares it under a Portuguese transliteration is a category of edge case the PoC simply couldn't contain. Next time we'd scope the PoC against two countries with known transliteration drift, not one. Second, we under-instrumented the natural-language query surface against the long tail of customer phrasings. Customers don't ask questions the way analysts write them, and the first month of live use produced a backlog of intent classes the schema didn't anticipate. Shipping the query surface behind a logging-only mode for two weeks before opening it to traffic would have given us a better cold-start.

By the numbers.

500M+Supplier records reconciled

99%Reconciliation accuracy

90% fasterData processing vs. legacy ElasticSearch

<1sQuery latency across billions of records

100+Countries served

2 wksInitial PoC (500K records, 99% accuracy)