Generative AI  

Create an Amazon Kendra Gen AI index with OneDrive

Introduction

Amazon Kendra GenAI index enhances semantic search by combining traditional search relevance with Large Language Model (LLM)-powered scoring to deliver accurate, contextual results. This feature extends beyond keyword matching to understand the semantic meaning of queries and documents, enabling natural and precise information retrieval. This article demonstrates how to create an Amazon Kendra GenAI index using Microsoft OneDrive as a data source.

Data source

Microsoft OneDrive is a secure cloud storage platform that allows users to store, share, and collaborate on documents from any device. For this demonstration, OneDrive accounts contain documents that will be indexed using Amazon Kendra GenAI. The OneDrive connector in Amazon Kendra enables direct integration with user accounts, supporting authentication via OAuth 2.0 to access files for indexing and searching.

Pre-requisites

  1. An AWS account with permissions to.
    • Create and manage an Amazon Kendra index
    • Configure AWS Secrets Manager
    • Set up IAM roles and permissions
  2. Application Developer permission role or the Global Administrator permission role to register a Microsoft Entra ID app.

Register an app in Microsoft Entra ID

We will use OAuth 2.0 authentication to connect to the Microsoft OneDrive data source. The following values will be required and should be securely stored in AWS Secrets Manager.

  • clientId: OAuth app client ID
  • clientSecret: OAuth app client secret

Perform the following steps to register an application in Microsoft Entra ID.

  1. Navigate to the Microsoft Azure Portal.
  2. Search for and click App registrations.
  3. Click New registration.
    New registration
  4. Enter a name for your application, select who can use this application, and click Register.
    Register
  5. An application will be created. You will see a page like the following screenshot. Note the application (client) ID and the directory (tenant) ID.
    Directory
  6. Select Certificates & secrets in the navigation pane. Select Client secrets and then click New client secret. Note the secret value.
     Client secrets
  7. Enter the description, select expiry, and choose Add.
  8. Select API permissions in the navigation pane and click Add a permission. Select Microsoft Graph from the list of applications.
    API permissions
  9. Select Application permissions and then select the following permissions.
    • Files.Read.All
    • User.Read.All
    • Group.Read.All
    • Notes.Read.All
  10. Click Add permissions.
     Add permissions
  11. Click Grant admin consent and select Yes for confirmation.
    Admin consent

Create a Gen AI index in Amazon Kendra

Create an index in Amazon Kendra to add Microsoft OneDrive as the data source.

  1. Navigate to the Amazon Kendra service in the AWS Console.
  2. Select Indexes in the navigation pane. Select Create index.
  3. Enter the index name, select the Create a new role (Recommended) option, and enter the Role name. Click Next.
    New role
  4. Select GenAI edition and click Next.
    GenAI edition
  5. Leave the default values for user access control and click Next.
  6. Review the index details and click Create.
    Create

Add OneDrive data source to the index

Once the index is provisioned, configure the OneDrive data source.

  1. Navigate to the Amazon Kendra service in the AWS Console.
  2. Select Indexes in the navigation pane.
  3. Select the newly created index and click on Add data sources.
  4. Select the OneDrive connector and click Add connector.
     Add Connector
  5. Enter the name for the data source and click Next.
    Data Source
  6. Enter the Tenant ID.
  7. Under the Authentication section, select Create and add a new secret.
    Authentication Section
  8. Enter the secret name, client ID and client secret values. Click Add Secret.
     Client ID
  9. Select Create a new role (Recommended) and enter the role name. Click Next.
    Recommended
  10. Select Add user names here as the sync scope and add a few OneDrive accounts.
    OneDrive Accounts
  11. Select Full sync as the sync mode and run on demand as the sync run schedule. Click Next.
    Full sync
  12. Click Next. Review the data source configuration and click Add data source.
  13. Once the data source is created successfully, click Sync now.
    Source details
  14. Verify the sync completion and status.
  15. Select Search indexed content in the navigation pane, search for content stored in your OneDrive account, and review the results.
    Search indexed content

Conclusion

In this article, you learned how to create and configure an Amazon Kendra GenAI index using Microsoft OneDrive as a data source.