I'm looking for advice on the best ways to make my SharePoint data, which consists of around 2 GB of PDFs, PPTX, and DOCX files, accessible for AI. I understand that the data needs to be indexed before it can be queried or interacted with via an interface. I've done some reading on the Microsoft 365 SharePoint Indexer and the requirements for app registration, but I'm curious about the different options available for exposing this indexed data to users in a way that's user-friendly. Ideally, I'd like to avoid any extra licensing costs for users, but I'm open to various suggestions.
4 Answers
You need to decide if you want to manage a full RAG (Retrieve, Augment, Generate) stack or stick with the M365 suite. If users already have E3/E5 or Business Premium licenses, the easiest route is to index using Microsoft 365, route it through Azure AI Search, and then create a simple web app or Power App. This way, authentication stays tied to Entra ID without extra user licenses. Set up the SharePoint indexer into Azure AI Search, link it with an Azure OpenAI model for your AI, and create a front-end app that interacts with your API. This setup balances control and simplicity nicely.
Indexing isn’t the tough part; the challenge is ensuring permissions are respected while providing seamless retrieval. A good approach is to keep SharePoint as the main source and have a lightweight RAG setup that handles search and interface functionality. This setup ensures flexibility and avoids any unexpected licensing issues later.
The standard approach in the Microsoft ecosystem is to utilize AI search. You’ll need an app registration with the appropriate Graph API permissions, specifically **Sites.Selected**, which is safer than using **sites.read.all**. It's been a couple of years since I worked on this, so be aware that things may have changed, but you’ll likely still need those API permissions along with a suitable database or index to ensure your content is accessible for LLMs. I'll keep an eye on this thread for updates!
Thanks for the input! I appreciate the emphasis on using **Sites.Selected**—that really helps streamline the permissions.
If you're considering using M365 Copilot, it’s designed for such tasks by default. You might want to check it out for integrating chat and search functionalities easily.

I appreciate your insights! It seems like I should explore both paths you mentioned: setting up a classic RAG and staying within M365. Do you have any guides or resources that might help with either approach?