RAG architecture - design and implementation
Vector database (pgvector, Qdrant, Weaviate), embedding model, retrieval strategy, reranking, prompt engineering.
Retrieval-Augmented Generation built on the organisation’s knowledge base - SharePoint, Confluence, ERP, CRM, document repositories. GDPR data classification, source-aware access control, audit trail.
The model answers based on retrieved documents rather than from memory alone. Every answer carries a source reference.
The assistant only sees documents that a specific user is authorised to access. This limits the risk of leaking data subject to classification.
A new document added to the knowledge base is available immediately, without costly model fine-tuning.
Every answer is logged together with the context used: who asked and which sources were used. This matters from the perspective of the AI Act and GDPR.
Vector database (pgvector, Qdrant, Weaviate), embedding model, retrieval strategy, reranking, prompt engineering.
SharePoint, Confluence, Notion, Google Drive, SAP, Salesforce, custom repositories. Incremental synchronisation, freshness monitoring.
Identification of personal data, sensitivity classification, access policies, masking for restricted-access data.
Integration with Entra ID, AD, SAP IDM - the assistant inherits the user’s permissions. No risk of access-level escalation.
Measuring answer quality - relevance, faithfulness, citation accuracy. Ongoing tuning of prompts and retrieval strategy.
Testing for prompt injection via context, indirect injection via documents, EchoLeak, ShareLeak. OWASP LLM Top 10.
We start with a data audit: which documents should be available to the assistant, in what formats they exist, how they are classified, and what access policies apply within the organisation. On this basis we design the RAG architecture and the scope of the first rollout.
We implement the first use case for one department, typically within a 6-10 week horizon. We check answer quality, correctness of source citation, functioning of access control, and data-security risks.
After the pilot we scale the solution to further sources and user groups. We work as standard with Claude or Azure OpenAI, and for critical data we recommend local LLMs.
Technology stack
The team’s experience in AI, data integration and enterprise systems confirms SNOK’s readiness to implement RAG solutions.
Law firm
RAG over 50,000 documents - a legal assistant citing clauses
FMCG manufacturer
RAG over commercial contracts - assistant for the sales department, integrated with SAP
Technology company
RAG over technical documentation - assistant for L1/L2 customer support
RAG adds knowledge dynamically through retrieval - updates are immediate, and an audit trail is available. Fine-tuning "teaches" the model knowledge - more expensive, slower to update, without citation. In most cases RAG is the better choice.
It drastically reduces them - the model answers based on retrieved documents. But it can still misinterpret or miscite - which is why quality monitoring and citation grounding matter.
Document sanitisation before indexing, separation of concerns in prompts, output validation, sandboxing for tools the agent can invoke. A full AI Security review.
Yes - the RAG architecture is model-agnostic. We work with Claude, OpenAI, Gemini, and local LLMs (Llama, Mistral, Qwen) within the client’s infrastructure.