RAG over SharePoint, Confluence, and Google Drive: Permissions-First Architecture

TL;DR

The hard problem in enterprise RAG is permissions: the assistant must never surface a document to a user who is not allowed to see it. Embedding strips the access control list, so a naive embed-everything, retrieve-top-k pipeline matches on similarity, not authorization, and leaks. Enforce access at the retrieval layer, not the prompt: attach each document's ACL as versioned, filterable metadata on every chunk and filter the vector search by the user's identity and groups before retrieval, so the model only ever sees authorized chunks. Propagate the end-user's identity, not a service account; store group IDs, not user IDs; let deny override grant; and keep the ACLs fresh, because a revoked permission the index has not caught up with is a live leak. Treat the embedding itself as personal data.

The hard problem in enterprise RAG is not retrieval quality, it is permissions: the assistant must never surface a document to a user who is not allowed to see it. Embedding strips the access control list, so a naive "embed everything, retrieve top-k" pipeline matches on similarity, not authorization, and leaks. The fix is architectural. Enforce access at the retrieval layer, not in the prompt: attach each source document's ACL as versioned, filterable metadata on every chunk, and filter the vector search by the asking user's identity and group memberships before retrieval, so the model only ever sees authorized chunks. Propagate the end-user's identity, not a service account; store group IDs, not user IDs; let deny override grant; and keep the ACLs fresh, because a revoked permission that the index has not caught up with is a live leak.

This is the implementation companion to our RAG production-readiness checklist, which names per-user permissions as the hard security problem. This post is how you actually solve it. Vendor-specific and preview features move fast; re-check the linked docs before you build.

Building a RAG assistant over your document stores?

Book Free Consultation

Why permissions are the hard part

The failure mode in one sentence: the model composes a fluent answer grounded in a document the asking user was never allowed to see, because retrieval matched on semantic similarity, not on authorization. Once a document is chunked and embedded into a shared index, the source system's owners, access levels, and sensitivity labels do not travel with the vector unless you deliberately carry them. A vector store has no concept of who is asking.

Microsoft 365 Copilot is the canonical real-world case. It does not break permissions; it respects them. The problem is that it makes content that was already overshared but hard to find trivially discoverable, so latent over-permissioning that nobody noticed becomes a one-prompt query. Microsoft's own remediation is telling: tools that pull risky content out of the assistant's reach (restricted content discovery, data-access governance reports) rather than asking the model to behave.

Which is the key point: telling the model not to reveal a document is not a control. OWASP's guidance on sensitive-information disclosure states plainly that system-prompt restrictions may not be honored and can be bypassed by prompt injection. Prompt injection is the number-one LLM risk, and it can arrive indirectly, hidden inside a retrieved document. Researchers have demonstrated near-total evasion of production guardrails, and a 2026 prompt-injection chain in Copilot's enterprise search worked precisely because a leak-prevention safeguard only applied to the first request. The boundary has to be the architecture, not the prompt.

The principle: enforce at the retrieval layer

The vector store must never return an unauthorized document in the first place, and the model must never process it. You achieve that by tagging each chunk with normalized, versioned ACL metadata at index time and including a permission filter in the query at retrieval time. Filtering after retrieval, in the application layer or the prompt, fails two ways: it wastes the context window fetching documents only to discard them, and if the app logic has a bug or is circumvented by an injection, the documents flow through anyway. Post-filtering also quietly breaks recall, because stripping unauthorized chunks out of a top-k set can leave the model with nothing when the authorized results sat just outside k. Pre-filter at the search, or over-fetch several times k and then filter.

Per-source permission models

Each source has its own ACL model and its own traps. You ingest with an elevated identity that can read all content and its permissions, normalize every source to the same shape, and resolve to group IDs.

Source	How permissions work	The trap to handle
SharePoint / OneDrive (Microsoft Graph)	Permissions cascade site to library to item; read ACLs via the item permissions endpoint using grantedToV2	Sharing one file silently breaks inheritance; "Limited Access" is plumbing, not a read grant; sensitivity labels add a second gate (the user needs the EXTRACT and VIEW rights, not just the SharePoint ACL)
Confluence	Effective view access is the intersection of space permission and page restriction; view restrictions inherit to child pages	The Cloud restrictions API does not return inherited restrictions, so you must walk every ancestor and union their read restrictions or you leak restricted child pages
Google Drive	Six roles, grantee types user/group/domain/anyone; read ACLs via the permissions list with permissionDetails for inheritance	Shared drives are strictly expansive (a child cannot have less access than its parent); a parent permission change records no child change-log entry, so you must re-propagate

One detail that bites teams: against Microsoft Graph, code against grantedToV2 and grantedToIdentitiesV2; the older grantedTo fields are deprecated. And permission visibility is itself access-trimmed, so a full ACL crawl needs an application identity with the right read-all scopes, not a delegated user token.

Implementation patterns that hold up

Store group IDs on the chunk, not user IDs. Lower cardinality, and a membership change needs no re-index, only query-time group resolution. Resolve transitive, nested membership on the principal side.
Retrieve with the end-user's identity. Ingest with a service identity, but filter at query time with the end user's identity and group claims, propagated via OAuth on-behalf-of. Use the immutable object ID and group object IDs for authorization, never a mutable display name. On-behalf-of works only for users, not service principals, so a service-account-only chain cannot do per-user trimming.
Pre-filter at the ANN query with a set-membership test. Use a set function (in your vector store's equivalent of search.in or an $in filter) over the user's allowed principals; a long chain of equality ORs is seconds slower. Deny overrides grant.
Handle the Entra groups-overage. If a user is in more than about 200 groups, the identity token drops the groups claim and emits an overage pointer; the retrieval layer must detect it and fetch the full list, or high-group-count users get silently under-filtered.
pgvector with Postgres row-level security is a clean primitive: a select policy appends a permission predicate to every query including the similarity search, so it is impossible to bypass in app code. Watch latency, because the policy can fight the approximate index; mitigate with per-tenant partial indexes or partitioning, and set the identity variable per request in your connection pooler.

There is a genuine fork in the guidance worth naming. One camp (including AWS) argues that query-time metadata filtering alone is not enough and you should re-check authorization against the source at retrieval, because synced metadata goes stale. The other (including Microsoft's reference patterns) syncs ACLs into the index and enforces with the user's token at query time for throughput. It is a real trade-off: freshness versus latency. Pick by how sensitive the data is.

The freshness problem nobody budgets for

When permissions are materialized at ingest, a source-side revocation is not automatically propagated, so a revoked user may still see a document in the index until the next sync. There are two leak windows: the sync window between a source change and the index update, and the inherited-scope blind spot, where a tool refreshes ACLs on items with unique permissions but misses changes that came from a parent scope. SharePoint needs an explicit permissions resync for inherited changes; Google Drive's shared-drive parent changes leave no child change-log entry. Use the source change feeds (Google Drive's Changes API fires on permission changes, not just edits), and either materialize faster with webhooks or do a live last-mile check, per the fork above.

Deletion is the other half. Right to erasure means deleting at every layer: the source object, every derived chunk, and every embedding, plus caches. Design chunk IDs up front so deleting a document and all its chunks is one cheap operation. And treat the embedding itself as personal data: embedding-inversion research has reconstructed the large majority of short source texts from the vector alone, so a delete that leaves the embedding behind has not actually forgotten anything.

GDPR and residency

Permissions-first RAG maps cleanly onto GDPR. Data minimisation argues against bulk-vectorizing everything reachable; retrieve and embed only what the assistant needs. The accountability principle is best served by a concrete artifact: a per-query retrieval audit log recording which documents were surfaced to which identity for which query, which also lets you investigate a suspected stale-ACL leak after the fact. And where the data lives still matters, so route inference to an EU region and sign a data processing agreement with the model provider, which we cover in EU data residency for AI apps in 2026.

The reference architecture, and the anti-patterns

End to end: ingest each source with an identity that can read all content and ACLs; extract the effective ACL per item (honoring broken inheritance, walking Confluence ancestors, reading Drive inheritance); chunk, embed, and attach the normalized versioned ACL as filterable metadata on every chunk; retrieve with the end-user identity and a mandatory permission filter; generate only from authorized chunks; log every decision; and run a freshness loop off the source change feeds with explicit resync for inherited changes and cascading deletes to chunks and embeddings.

The anti-patterns, most of which are real incidents waiting to happen: one shared index with no permission filter; filtering after retrieval or in the prompt; trusting the model to self-censor; retrieving with a service-account identity, which kills per-user trimming; storing per-user IDs instead of group IDs; treating SharePoint "Limited Access" as a read grant; reading Confluence restrictions without walking ancestors; mirroring only SharePoint ACLs while ignoring label encryption; updating only on full crawls so revocations linger; and deleting the source document but leaving the embedding.

"The model is not your access control, and the prompt is not your security boundary. If a user is not allowed to read a document, the vector store must never return it. Everything else, the labels, the freshness, the audit log, is in service of that one rule."

Frequently Asked Questions

How do you enforce permissions in RAG?

Attach each source document's ACL (allowed users and groups) as versioned, filterable metadata on every chunk at index time, then filter the vector search by the asking user's identity and group memberships before retrieval. The model only ever sees authorized chunks. Never enforce in the prompt or after retrieval.

Does RAG respect SharePoint permissions?

Not by default, because embedding strips the ACL. It respects them only if you ingest the SharePoint and Graph permissions (using grantedToV2, broken inheritance, and sensitivity-label EXTRACT and VIEW rights) alongside content and security-trim at query time with the signed-in user's identity. Microsoft 365 Copilot does this natively.

What is security trimming in RAG?

Removing results the current user is not allowed to see. Early-binding security trimming applies the user's allowed-principal set as a mandatory filter at the search, so unauthorized documents are never returned, as opposed to post-filtering, which fetches then discards and is both slower and leak-prone.

How do you implement per-user permissions in RAG?

Ingest with a service identity, but retrieve with the end-user's identity via OAuth on-behalf-of, using the immutable object ID and group claims. Store group IDs on chunks, resolve the user to groups at query time, pre-filter the vector search, and let deny override grant.

Should I store user IDs or group IDs on each chunk?

Group IDs. They are lower cardinality, and a membership change needs no re-index, only query-time group resolution. Resolve transitive, nested group membership on the principal side at query time.

How do I keep permissions fresh so I do not leak after a revocation?

Use source change feeds (Google Drive's Changes API, SharePoint delta plus indexer, Confluence polling) for incremental ACL sync, and explicitly resync inherited or parent-scope changes, the common blind spot. Either materialize faster with webhooks or do a live last-mile authorization check at query time.

Can I just tell the model not to reveal restricted documents?

No. OWASP states that system-prompt restrictions may not be honored and can be bypassed by prompt injection, and researchers have shown near-total evasion of production guardrails. Access control must be architectural, at the retrieval layer.

Can pgvector enforce per-user access?

Yes. Postgres row-level security adds a permission predicate to every query, including the similarity query, so it is impossible to bypass in app code. Inject the identity via a session variable. Watch latency, because the policy can fight the approximate index; mitigate with per-tenant partial indexes or partitioning.

Are document embeddings personal data under GDPR?

Treat them as such. Embedding-inversion attacks reconstruct the large majority of short source texts from the vector alone, so an embedding can identify the data subject. Right to erasure therefore requires deleting the embedding too, not just the source document.

How does Confluence permission ingestion differ?

Effective view access is the intersection of space permission and page restriction, and view restrictions inherit to child pages. The Cloud restrictions API does not return inherited restrictions, so you must walk ancestors and union their read restrictions, or you will leak restricted child pages.

Final thoughts

Enterprise RAG over your own document stores lives or dies on one rule: if a user cannot read a document, the retriever must never return it, and the model must never see it. Embedding strips the ACL, the prompt cannot be trusted to put it back, and the leak window after a revocation is real.

So build permissions-first. Carry each source's ACL as versioned metadata onto every chunk, retrieve with the end-user's identity and a mandatory filter, keep the ACLs fresh from the source change feeds, treat the embedding as personal data, and log what the assistant showed to whom. Get that architecture right and the rest of RAG is the easy part.

Want a permissions-first RAG built on your SharePoint, Confluence, or Drive?