RAG-first AI enrichment. Zero cost on products you already know.
A two-phase pipeline that queries your supplier documents first (local Qdrant RAG), falls back to web search if needed (Perplexity or self-hosted SearXNG), then synthesizes with the LLM of your choice. BYOK on 14 providers.
The two-phase pipeline
Collect — RAG + Web
The pipeline first queries your indexed document base (RAG on local Qdrant — supplier technical sheets, manuals, certificates). If RAG coverage is sufficient (≥ 3 relevant chunks), no web call is made — zero AI cost. Otherwise, automatic fallback to a configurable web search (Perplexity or self-hosted SearXNG).
Synthesize — LLM
An LLM of your choice consolidates documentary data (primary source) and web data (secondary source) into an enriched sheet: SEO description, extracted technical attributes, dimensions, weight, image alt tags, competitor prices and official videos. Documentary sources always take priority over web.
What's in the enrichment module
RAG-first — zero cost when documents are indexed
Index your supplier PDFs, technical sheets, manuals and certificates in local Qdrant. The enrichment pipeline queries them first. When coverage is sufficient (≥ 3 relevant chunks), no LLM web call is triggered — the cost of enriching that product is zero.
Web fallback — Perplexity or self-hosted SearXNG
When RAG coverage is insufficient, the pipeline falls back to a configurable web search. Use Perplexity (managed) for quality, or SearXNG (self-hosted) for full data sovereignty — no third-party knows what you’re searching.
LLM synthesis with priority rules
A configurable LLM (OpenAI, Anthropic, Mistral, local Ollama…) consolidates RAG and web sources. The synthesize prompt enforces that documentary data is more reliable than web — your supplier specs always win over a random Amazon listing.
14 BYOK AI providers
Bring your own keys for 14 native AI providers (OpenAI, Anthropic, DeepSeek, Gemini, Grok, Groq, Jina, Mistral, Ollama, OpenRouter, Perplexity, SearXNG, Together, DeepL). No markup, no per-token tax — your provider bills you directly.
11 AI contexts assignable to different providers
Optimize cost vs quality by context: cheap models for translation, premium for enrichment synthesis, local Ollama for compliance. Smart Router picks the right provider per task, Cost Estimator predicts the bill before execution.
Custom prompt library
Define prompts per product category — your tone, mandatory attributes, SEO rules. Reused across the entire catalog. Override per supplier or per brand when needed.
Why RAG-first changes the economics of AI enrichment
Most "AI enrichment" tools call an LLM on every product, every time, with web search bolted on. At 900k products and €0.01-0.05 per call, the bill compounds fast — and the LLM is usually inventing details it doesn't actually know about your specific SKU.
PixeePIM's pipeline inverts the default. If you've indexed the supplier datasheet for a product, the RAG retrieves the actual technical specs — weight, dimensions, certifications, materials — and the LLM only consolidates them. No hallucination, no web call, no per-product cost. The marginal cost of enriching that product is the one-time embedding cost, amortized across every future re-enrichment.
For distributors who already have PDFs from their suppliers, this is the difference between a recurring €5k/month AI bill and a one-time €200 indexing cost. The web fallback only kicks in for products where you genuinely have no data — typically new SKUs in onboarding.
AI enrichment is included in every paid plan
BYOK on your AI provider keys. No PixeePIM markup, no per-token tax — your provider bills you directly.