Skip to content

Step-by-step tutorial how to embed content to GPT-4 using Azure Webservices

License

Notifications You must be signed in to change notification settings

JULITHCH/azure_embedded_lmm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

How-to use own data within GPT-4 prompt hosted on Azure Webservices

  1. February 2025, JULITH GmbH, Thomas RIEHN

Overview and Motivation

Embedding self-managed content into Large Language Models (LLMs) like GPT-4 offers numerous advantages and compelling reasons for its implementation. One key reason is that it allows for the integration of domain-specific knowledge that may not be universally available in the model's original training data. This is particularly beneficial in industries like healthcare, law, or engineering, where specialized information is crucial. By embedding such content, the model can provide more accurate, context-specific, and detailed responses, significantly improving its value in niche applications.

Another advantage is the ability to ensure improved accuracy and relevance. While LLMs like GPT-4 have a broad understanding of language and concepts, they may lack up-to-date or highly specialized information. Embedding self-managed content, such as proprietary datasets or specialized knowledge, ensures that the model delivers accurate, relevant, and timely information, which is particularly valuable when the model’s training data might not cover recent developments or unique subject matter.

Embedding self-managed content also enables personalization, allowing the model to adapt to user preferences, historical interactions, or specific contextual needs. This makes it possible for organizations to create tailored experiences, ensuring that the LLM aligns with individual user preferences or organizational standards. Additionally, by embedding carefully curated and up-to-date information, it becomes easier to reduce the likelihood of "hallucinations," which occur when an LLM generates plausible-sounding but inaccurate information. Self-managed content offers a controlled, verified source of information, enhancing the reliability of the model's outputs.

The ability to continuously update and adapt the model is another significant benefit. With embedded self-managed content, organizations can update the model’s knowledge base with new information as it becomes available. This means that the LLM can stay relevant and accurate over time without requiring complete retraining or relying on infrequent updates from the model's developers. Furthermore, organizations have greater control over the knowledge embedded in the LLM, allowing them to ensure that the model’s outputs are aligned with company policies, ethical standards, and legal requirements.

Embedding self-managed content also reduces dependence on external data sources, improving the speed and consistency of the model’s responses. Rather than relying on third-party systems or services, the LLM can directly access proprietary or frequently-used information, making it more efficient. This leads to operational efficiency as the model can handle specialized tasks, such as customer support or decision-making, without requiring human intervention for basic queries.

From a data security perspective, embedding self-managed content provides a higher level of control over sensitive or confidential information. Organizations can apply their own data governance policies, ensuring that the model’s outputs are secure and comply with privacy regulations. Moreover, integrating self-managed content with existing organizational systems, like databases or CRM platforms, allows for a seamless flow of information, enhancing the model’s capability to support complex tasks like analytics and reporting.

Finally, embedding self-managed content can be cost-effective, as it reduces the need for relying on external APIs or continuously updating training data, which can be expensive. This approach allows organizations to decrease their reliance on paid data sources and API calls, ultimately lowering operational costs. In summary, embedding self-managed content into LLMs like GPT-4 enables organizations to harness more accurate, secure, and tailored outputs, improving both the model's effectiveness and its operational efficiency.

Basic Configuration of Azure-Webservices

Advantages of Azure OpenAI Services

Azure OpenAI Services offer several advantages when it comes to embedding self-managed content into Large Language Models (LLMs) like GPT-4, making it a powerful platform for organizations seeking to integrate their own proprietary data while maintaining control, security, and scalability. One of the key benefits is customization. Azure allows organizations to fine-tune models on their own datasets, enabling them to embed domain-specific, self-managed content directly into the model. This ensures that the model can generate more accurate and contextually relevant responses based on the unique data it has been provided.

Another significant advantage is scalability. Azure's infrastructure is built to handle large-scale applications, so when self-managed content is embedded into models, it can efficiently scale across millions of queries or users without compromising performance. This is particularly beneficial for enterprises that require robust, high-performance AI services capable of managing large volumes of data while ensuring that embedded content remains effective and accessible.

Security and compliance are also notable advantages of Azure OpenAI Services. Azure provides enterprise-grade security and compliance frameworks, ensuring that organizations can embed sensitive or proprietary self-managed content securely. Data is protected with advanced encryption methods, and compliance with various regulations, such as GDPR and HIPAA, is facilitated by Azure’s comprehensive tools and certifications. This is especially crucial for businesses in industries that handle sensitive data and need to ensure that their AI solutions meet stringent legal and ethical requirements.

Additionally, integration with existing Azure services is another key benefit. Organizations that are already using other Azure services, such as Azure Cognitive Services, Azure Databases, or Azure Machine Learning, can seamlessly integrate their self-managed content into the OpenAI models. This interoperability allows for a more streamlined workflow, where data can be pulled from other internal systems and used to train or fine-tune models. This makes the embedding of self-managed content into LLMs much easier and more effective, enhancing the overall user experience.

Azure also enables continuous updates to embedded content. This flexibility allows organizations to frequently update their self-managed content, whether to reflect new knowledge, changing regulations, or evolving business needs. With Azure’s AI infrastructure, updates can be made quickly and efficiently without the need for re-training from scratch, ensuring that the model remains current and aligned with the organization’s requirements.

Furthermore, Azure offers advanced monitoring and management tools that help organizations oversee the performance of their embedded content in real time. Azure’s tools allow for detailed insights into how the model is performing, how the embedded content is influencing the responses, and where adjustments might be needed. This data-driven approach enhances decision-making and ensures that the integration of self-managed content is continuously optimized for the best results.

In summary, Azure OpenAI Services provide a comprehensive and flexible platform for embedding self-managed content into models like GPT-4. The advantages include customization for domain-specific knowledge, scalability to handle large data volumes, strong security and compliance features, seamless integration with existing Azure services, the ability for continuous updates, and advanced monitoring tools—all of which contribute to making the process of embedding self-managed content highly effective and secure for organizations.

Getting started

In the following example we want to embed text documents. Therefore it is necessary to make use of several Azure services. But: first of all login to your azure account ;)

Bildschirmfoto 2025-02-27 um 08 55 44

Creating a new subscription

We start our project using a new and dedicated subscription. You can skip this chapter, if you don't want to have a separate subscription.

You click on "Subscriptions" and the subscription overview opens up Bildschirmfoto 2025-02-27 um 08 57 08 After clicking "Add" you can enter the details for the new subscription. Bildschirmfoto 2025-02-27 um 08 57 55 After clicking "Review + Create" you see the details of your subscription to be created. Bildschirmfoto 2025-02-27 um 08 58 10 After clicking "Create" the subscription will be created. Bildschirmfoto 2025-02-27 um 08 58 28 Bildschirmfoto 2025-02-27 um 08 58 50

Creating a new resource group

We continue to set-up a new resource group within the newly created subscription.

We create a resource group by clicking "Create" within the overview of the existing resource groups ("Settings" -> "Resource Groups") Bildschirmfoto 2025-02-27 um 09 12 01

Bildschirmfoto 2025-02-27 um 09 14 12 Bildschirmfoto 2025-02-27 um 09 14 42

Creating a new storage account

With clicking "+ Create Resources" we create a storage account. Bildschirmfoto 2025-02-27 um 09 16 58

As primary service we select "Azure Blob Storage or Azure Data Lake Storage Gen 2" Bildschirmfoto 2025-02-27 um 09 18 09

After clicking "Review + Create" we see the summary of the storage account. Bildschirmfoto 2025-02-27 um 09 18 49

After clicking "Create" the storage account will be created. Bildschirmfoto 2025-02-27 um 09 19 38

Creating a new storage container

We go to the storage account by clicking "Go to resource". Bildschirmfoto 2025-02-27 um 09 19 52

In the following step we need to create an container ("Data storage" -> "Containers"). After clicking "+ Container" we can define the name of the container within the subwindow on the right hand side. Bildschirmfoto 2025-02-27 um 09 20 20

Configure metadata of storage container

This is an optional step if you want to add some metadata to your data within the storage container. If you want to do so, you have to click on the three points on the right side next to the created storage container and select "Edit metadata" in the context menu. Bildschirmfoto 2025-02-27 um 09 57 27

We will add two fields for testing purposes (the documentname and URL) to demonstrate the possibilities. Bildschirmfoto 2025-02-27 um 09 59 40 Bildschirmfoto 2025-02-27 um 10 00 49

Import Data into Storage Container

Entering the container you have the possibility to upload content. For our test propose we decide to upload text files. For sure it is possible to import other data-formats as well. Azure is capable of identifying content within pictures as well. In regards to the data size we are limited to 64kb due to the pricing model we will select during the configuration of AI Search furtheron.

Bildschirmfoto 2025-02-27 um 09 20 43

After clicking "upload" you can select the files to be uploaded. For demonstration purposes we selected "Cool" as Access tier. Bildschirmfoto 2025-02-27 um 10 58 00

After uploading the overview should look like this. Bildschirmfoto 2025-02-27 um 11 22 12

Configuring AI Search Component

Next we want to set-up the AI Search. First of all we need to create a corresponding resource. Bildschirmfoto 2025-02-27 um 09 23 10

Creating an AI Search

After clicking "+ Create" we are asked for the details. Bildschirmfoto 2025-02-27 um 09 24 59

Within the selection of the "Pricing Tier" you can differentiate between following models imageTable from: https://learn.microsoft.com/en-us/azure/search/search-sku-tier

After clickign "Review + Create" we see th summary of the sotrage account. Bildschirmfoto 2025-02-27 um 09 25 21

After clicking "Create" the AI Search will be created. Bildschirmfoto 2025-02-27 um 09 25 45

Bildschirmfoto 2025-02-27 um 09 25 53

Setting-up the necessary identities

For the seamless integration of the services we highly recommend to set-up an identity based configuration. It is necessary to change the identiy to system-assigned. Bildschirmfoto 2025-02-27 um 09 26 11

After commiting the change with the "Save" button, an Azure Object will be created. Bildschirmfoto 2025-02-27 um 09 50 26

Next it is necessary to change the "Azure role assignments" by clicking on the samenamed button. Bildschirmfoto 2025-02-27 um 09 51 05

There are to role assignments wich are necessary to implement, one for the resource group an one for the storage account. With the "+ Add role assignment" button we go futheron.

Resource Group

We define "Resource Group" as scope and select the created resource group from above. The role wich needs to be selected is the "Cognitive Services User" role. Bildschirmfoto 2025-02-27 um 09 52 17

Storage Account

We define "Storage" as scope and select the created storage account from above. The role wich needs to be selected is the "Storage Blob Data Reader" role. Bildschirmfoto 2025-02-27 um 09 53 32

Change Authentication to RBAC

Within the keys section we have to switch from API-keys to Role-based access control. Bildschirmfoto 2025-02-27 um 15 22 51 Bildschirmfoto 2025-02-27 um 15 22 56

After the successful switch it should look like this: Bildschirmfoto 2025-02-27 um 15 23 04

Create an Azure OpenAI Service

Add Azure OpenAI to your subscription

To add Azure OpenAI to your subscription search for "OpenAI" Bildschirmfoto 2025-02-27 um 10 00 49 Bildschirmfoto 2025-02-27 um 13 54 43

After having found the service you can start to create the service instance. Within that process we need to select a region, a name and the pricing tier. Bildschirmfoto 2025-02-27 um 10 07 29

After clicking "Next" we are asked to select the type of network security we want for the AI Services resource. Bildschirmfoto 2025-02-27 um 10 07 35

After clicking "Next" we are asked to enter additional tags, which we will leave blank. Bildschirmfoto 2025-02-27 um 10 07 49

After clicking "Next" and "Review + create" the service will be created. Bildschirmfoto 2025-02-27 um 10 10 48 Bildschirmfoto 2025-02-27 um 10 11 13

Resource Group overview

Our resource group should now look like this. Bildschirmfoto 2025-02-27 um 10 11 22

Role and role assignments

Step 1: Set-up Role-Identity

Within the "Resource Management" we can find the possibility of switching to Azure role-based access control (Azure RBAC). The doing is quite the same compared to the AI Search service. Bildschirmfoto 2025-02-27 um 10 13 21

After having saved the status change from "Off" to "On" we can see the created Azure Object ID. Bildschirmfoto 2025-02-27 um 10 13 52

Step 2: Set-up Azure role assignments

We need to click "Azure role assignments" first. Bildschirmfoto 2025-02-27 um 15 03 42

Step 3: Search Service Contributor

We add within the scope of the resource group the role "Search Service Contributor". Bildschirmfoto 2025-02-27 um 15 02 46

Step 4: Search Index Data Reader

We add within the scope of the resource group the role "Search Index Data Reader". Bildschirmfoto 2025-02-27 um 15 03 13

Azure OpenAI Overview

By clicking on the OpenAI resource the following window opens. Bildschirmfoto 2025-02-27 um 10 11 32

Configure Service

By clicking on "Explore Azure AI Foundry portal" a new tab opens up in the browswer an d you enter the Chat-Playground Bildschirmfoto 2025-02-27 um 10 11 43

Append Language Models

We now want to add two different language models.

GPT-4 as lanugage model for the chat-prompt

Referring to the first steps we select the model cataloge. Bildschirmfoto 2025-02-27 um 10 12 14

After having found, we can deploy this language model. Bildschirmfoto 2025-02-27 um 10 12 23

Bildschirmfoto 2025-02-27 um 10 12 30

ada-002 as language model for the embedding

We search within the model catalog for "ada". Bildschirmfoto 2025-02-27 um 10 12 49

After having found, we can deploy this language model. Bildschirmfoto 2025-02-27 um 10 12 53

Bildschirmfoto 2025-02-27 um 10 12 57

Import data into the

Vectorize Content for Azure Search

We switch back again into the created search service within our created resource group and click on "Import and vectorize data". Bildschirmfoto 2025-02-27 um 11 22 35

Step 1: Data connection

In the first step we need to set-up our data connection (Azure Blob Storage in our case). Bildschirmfoto 2025-02-27 um 11 22 44

Within this step it is necessary to select the authentication using a system-assigned managed identity. Bildschirmfoto 2025-02-27 um 11 22 59

Step 2: Vectorization model

In the next step it is necessary to select the OpenAI Service and the deployed text-embedding model. Bildschirmfoto 2025-02-27 um 11 23 24

Step 3: Vectorization of images

In our test-case we leave this blank. Bildschirmfoto 2025-02-27 um 11 23 31

Step 4: Advanced Settings

We highly recommend to enable the semantic ranker and leave the schedule for indexing by once. Bildschirmfoto 2025-02-27 um 11 23 53

Step 5: Review and create

Bildschirmfoto 2025-02-27 um 11 23 57 Bildschirmfoto 2025-02-27 um 11 24 09

After clicking start-searching you reach the vector. Bildschirmfoto 2025-02-27 um 11 24 14

Within that final process the index Bildschirmfoto 2025-02-27 um 11 24 26

the indexers Bildschirmfoto 2025-02-27 um 11 24 22

and the skillset Bildschirmfoto 2025-02-27 um 11 24 35 has been generated.

During the indexing process you can see the progress like this: Bildschirmfoto 2025-02-27 um 11 24 44

The final result should hopefully look like this: Bildschirmfoto 2025-02-27 um 11 35 53

Configure Azure OpenAI Prompt

Add data source

Before we are able to finally set-up the chat-prompt we need to add our data source within the chat-playground. Bildschirmfoto 2025-02-27 um 14 53 08

Step 1: Selection of data source

Bildschirmfoto 2025-02-27 um 14 53 55

We select "Azure AI Search" and our "Search Index" Bildschirmfoto 2025-02-27 um 14 54 11

We select, that we want to embed the vectorsearch to this search resource and need to select the embedding model. Bildschirmfoto 2025-02-27 um 14 54 24

Step 2: Data managment

We stay with the hybrid and semantic search type and select the exisiting semantic search configuration. Bildschirmfoto 2025-02-27 um 14 54 46

Step 3: Data connection

We select that we want to use the system managed identity. Bildschirmfoto 2025-02-27 um 14 54 54

If you get an error like this, you need to step back once again and set-up the roles correctly. Bildschirmfoto 2025-02-27 um 14 59 47

Step 4: Review & Create

We have to review and create the settings. Bildschirmfoto 2025-02-27 um 15 25 07

#Conclusion Now you are able to use the prompt on the right-hand side. Bildschirmfoto 2025-02-27 um 18 07 05

#Remark It could be necessary to add the user to the roles of the "Azure OpenAI" object as well.

Assigning role to the asking user

Bildschirmfoto 2025-02-27 um 17 44 58

Select the role. Bildschirmfoto 2025-02-27 um 17 45 18

Add the necessary user to the role. Bildschirmfoto 2025-02-27 um 17 45 36

And "Review + assign" the role membership. Bildschirmfoto 2025-02-27 um 17 45 41

About

Step-by-step tutorial how to embed content to GPT-4 using Azure Webservices

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published