@@ -8,68 +8,79 @@ author: HeidiSteen
ms.author: heidist
ms.service: azure-ai-search
ms.topic: how-to
-ms.date: 05/05/2025
+ms.date: 05/30/2025
---
# Retrieve data using a knowledge agent in Azure AI Search
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
-In Azure AI Search, *agentic retrieval* is a new parallel query architecture that uses a conversational large language model (LLM) for query planning, generating subqueries that broaden the scope of what's searchable and relevant.
+In Azure AI Search, *agentic retrieval* is a new parallel query architecture that uses a chat completion model for query planning. It generates subqueries that broaden the scope of what's searchable and relevant.
-This article explains how to use the [**retrieve** method](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-05-01-preview&preserve-view=true) that invokes a knowledge agent and parallel query processing. This article also explains the three components of the retrieval response:
+This article explains how to use the [**retrieve method**](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-05-01-preview&preserve-view=true) that invokes a knowledge agent and parallel query processing. This article also explains the three components of the retrieval response:
+ *extracted response for the LLM*
+ *referenced results*
+ *query activity*
+The retrieve request can include instructions for query processing that override the defaults set on the knowledge agent.
+
> [!NOTE]
-> Currently, there's no model-generated "answer" in the response. Instead, the response provides grounding data that you can use to generate an answer from an LLM. For an end-to-end example, see [Build an agent-to-agent retrieval solution ](search-agentic-retrieval-how-to-pipeline.md) or [Azure OpenAI Demo](https://github.com/Azure-Samples/azure-search-openai-demo).
+> There's no model-generated "answer" in the response. Instead, the response passes content to an LLM that grounds its answer based on the content. For an end-to-end example that includes this step, see [Build an agent-to-agent retrieval solution ](search-agentic-retrieval-how-to-pipeline.md) or [Azure OpenAI Demo](https://github.com/Azure-Samples/azure-search-openai-demo).
## Prerequisites
-+ A [knowledge agent definition](search-agentic-retrieval-how-to-create.md) that represents a conversational language model.
++ A [knowledge agent](search-agentic-retrieval-how-to-create.md) that represents the chat completion model and a valid target index.
+
++ Azure AI Search, in any [region that provides semantic ranker](search-region-support.md), on basic tier and higher. Your search service must have a [managed identity](search-howto-managed-identities-data-sources.md) for role-based access to a chat completion model.
-+ Azure AI Search, in any [region that provides semantic ranker](search-region-support.md), on basic tier and above. Your search service must have a [managed identity](search-howto-managed-identities-data-sources.md) for role-based access to a chat model.
++ Permissions on Azure AI Search. **Search Index Data Reader** can run queries on Azure AI Search, but the search service managed identity must have **Cognitive Services User** permissions on the Azure OpenAI resource. For more information about local testing and obtaining access tokens, see [Quickstart: Connect without keys](search-get-started-rbac.md).
-+ API requirements. Use 2025-05-01-preview data plane REST API or a prerelease package of an Azure SDK that provides knowledge agent APIs.
++ API requirements. To create or use a knowledge agent, use [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API. Or, use a prerelease package of an Azure SDK that provides knowledge agent APIs: [Azure SDK for Python](https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/search/azure-search-documents/CHANGELOG.md), [Azure SDK for .NET](https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/search/Azure.Search.Documents/CHANGELOG.md#1170-beta3-2025-03-25), [Azure SDK for Java](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/search/azure-search-documents/CHANGELOG.md).
To follow the steps in this guide, we recommend [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) for sending REST API calls to Azure AI Search. There's no portal support at this time.
## Call the retrieve action
Call the **retrieve** action on the knowledge agent object to invoke retrieval and return a response. Use the [2025-05-01-preview](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-05-01-preview&preserve-view=true) data plane REST API or an Azure SDK prerelease package that provides equivalent functionality for this task.
+All `searchable` fields in the search index are in-scope for query execution. If the index includes vector fields, your index should have a valid vectorizer definition so that it can vectorize the query inputs. Otherwise, vector fields are ignored. The implied query type is `semantic`, and there's no search mode or selection of search fields.
+
The input for the retrieval route is chat conversation history in natural language, where the `messages` array contains the conversation.
```http
+@search-url=<YOUR SEARCH SERVICE URL>
+@accessToken=<YOUR PERSONAL ID>
+
# Send Grounding Request
POST https://{{search-url}}/agents/{{agent-name}}/retrieve?api-version=2025-05-01-preview
-api-key: {{search-api-key}}
-Content-Type: application/json
+ Content-Type: application/json
+ Authorization: Bearer {{accessToken}}
{
"messages" : [
{
- "role" : "system",
+ "role" : "assistant",
"content" : [
- { "type" : "text", "text" : "You are a helpful assistant for Contoso Human Resources. You have access to a search index containing guidelines about health care coverage for Washington state. If you can't find the answer in the search, say you don't know." }
+ { "type" : "text", "text" : "You can answer questions about the Earth at night.
+ Sources have a JSON format with a ref_id that must be cited in the answer.
+ If you do not have the answer, respond with "I don't know"." }
]
},
{
"role" : "user",
"content" : [
- { "type" : "text", "text" : "What are my vision benefits?" }
+ { "type" : "text", "text" : "Why is the Phoenix nighttime street grid is so sharply visible from space, whereas large stretches of the interstate between midwestern cities remain comparatively dim?" }
]
}
],
"targetIndexParams" : [
{
"indexName" : "{{index-name}}",
- "filterAddOn" : "State eq 'WA'",
+ "filterAddOn" : "page_number eq 105'",
"IncludeReferenceSourceData": true,
- "rerankerThreshold " : 2.5,
- "maxDocsForReranker": 250
+ "rerankerThreshold" : 2.5,
+ "maxDocsForReranker": 50
}
]
}
@@ -79,21 +90,23 @@ Content-Type: application/json
+ `messages` articulates the messages sent to the model. The message format is similar to Azure OpenAI APIs.
- + `role` defines where the message came from, for example either `system` or `user`. The model you use determines which roles are valid.
+ + `role` defines where the message came from, for example either `assistant` or `user`. The model you use determines which roles are valid.
+ `content` is the message sent to the LLM. It must be text in this preview.
+ `targetIndexParams` provide instructions on the retrieval. Currently in this preview, you can only target a single index.
+ `filterAddOn` lets you set an [OData filter expression](search-filters.md) for keyword or hybrid search.
- + `IncludeReferenceSourceData` is initially set in the knowledge agent definition. You can override that setting in the retrieve action to return grounding data in the [references section](#review-the-references-array) of the response.
+ + `IncludeReferenceSourceData` tells the retrieval engine to return the source content in the response. This value is initially set in the knowledge agent definition. You can override that setting in the retrieve action to return original source content in the [references section](#review-the-references-array) of the response.
+ `rerankerThreshold` and `maxDocsForReranker` are also initially set in the knowledge agent definition as defaults. You can override them in the retrieve action to configure [semantic reranker](semantic-how-to-configure.md), setting minimum thresholds and the maximum number of inputs sent to the reranker.
`rerankerThreshold` is the minimum semantic reranker score that's acceptable for inclusion in a response. [Reranker scores](semantic-search-overview.md#how-ranking-is-scored) range from 1 to 4. Plan on revising this value based on testing and what works for your content.
- `maxDocsForReranker` dictates the maximum number of documents to consider for the final response string. Semantic reranker accepts 50 documents. If the maximum is 200, four more subqueries are added to the query plan to ensure all 200 documents are semantically ranked. for semantic ranking. If the number isn't evenly divisible by 50, the query plan rounds up to nearest whole number.
+ `maxDocsForReranker` dictates the maximum number of documents to consider for the final response string. Semantic reranker accepts 50 documents. If the maximum is 200, four more subqueries are added to the query plan to ensure all 200 documents are semantically ranked. for semantic ranking. If the number isn't evenly divisible by 50, the query plan rounds up to nearest whole number.
+
+ The `content` portion of the response consists of the 200 chunks or less, excluding any results that fail to meet the minimum threshold of a 2.5 reranker score.
## Review the extracted response
@@ -104,22 +117,25 @@ The body of the response is also structured in the chat message style format. Cu
```http
"response": [
{
- "role": "system",
+ "role": "assistant",
"content": [
{
"type": "text",
- "text": "[{\"ref_id\":0,\"title\":\"Vision benefits\",\"terms\":\"exams, frames, contacts\",\"content\":\"<content chunk>\"}]"
+ "text": "[{\"ref_id\":0,\"title\":\"Urban Structure\",\"terms\":\"Location of Phoenix, Grid of City Blocks, Phoenix Metropolitan Area at Night\",\"content\":\"<content chunk redacted>\"}]"
}
]
}
]
```
-`content` is a JSON array. It's a single string composed of the most relevant documents (or chunks) found in the search index, given the query and chat history inputs. This array is your grounding data that a conversational language model uses to formulate a response to the user's question.
+**Key points**:
-The `maxOutputSize` property on the knowledge agent determines the length of the string. We recommend 5,000 tokens.
++ `content` is a JSON array. It's a single string composed of the most relevant documents (or chunks) found in the search index, given the query and chat history inputs. This array is your grounding data that a conversational language model uses to formulate a response to the user's question.
-Fields in the content `text` response string include the ref_id and semantic configuration fields: `title`, `terms`, `terms`.
++ text is the only valid value for type, and the text consists of the reference ID of the chunk (used for citation purposes), and any fields specified in the semantic configuration of the target index. In this example, you should assume the semantic configuration in the target index has a "title" field, a "terms" field, and a "content" filed.
+
+> [!NOTE]
+> The `maxOutputSize` property on the [knowledge agent](search-agentic-retrieval-how-to-create.md) determines the length of the string. We recommend 5,000 tokens.
## Review the activity array
@@ -137,36 +153,53 @@ Output includes:
Here's an example of an activity array.
```json
- "activity": [
+"activity": [
{
"type": "ModelQueryPlanning",
"id": 0,
- "inputTokens": 1308,
- "outputTokens": 141
+ "inputTokens": 1261,
+ "outputTokens": 270
},
{
"type": "AzureSearchQuery",
"id": 1,
- "targetIndex": "myindex",
+ "targetIndex": "earth_at_night",
"query": {
- "search": "hello world programming",
+ "search": "suburban belts December brightening urban cores comparison",
"filter": null
},
- "queryTime": "2025-04-25T16:40:08.811Z",
- "count": 2,
- "elapsedMs": 867
+ "queryTime": "2025-05-30T21:23:25.944Z",
+ "count": 0,
+ "elapsedMs": 600
},
{
"type": "AzureSearchQuery",
"id": 2,
- "targetIndex": "myindex",
+ "targetIndex": "earth_at_night",
"query": {
- "search": "hello world meaning",
+ "search": "Phoenix nighttime street grid visibility from space",
"filter": null
},
- "queryTime": "2025-04-25T16:40:08.955Z",
+ "queryTime": "2025-05-30T21:23:26.128Z",
"count": 2,
- "elapsedMs": 136
+ "elapsedMs": 161
+ },
+ {
+ "type": "AzureSearchQuery",
+ "id": 3,
+ "targetIndex": "earth_at_night",
+ "query": {
+ "search": "interstate visibility from space midwestern cities",
+ "filter": null
+ },
+ "queryTime": "2025-05-30T21:23:26.277Z",
+ "count": 0,
+ "elapsedMs": 147
+ },
+ {
+ "type": "AzureSearchSemanticRanker",
+ "id": 4,
+ "inputTokens": 2622
}
],
```
@@ -175,6 +208,8 @@ Here's an example of an activity array.
The `references` array is a direct reference from the underlying grounding data and includes the `sourceData` used to generate the response. It consists of every single document that was found and semantically ranked by the search engine. Fields in the `sourceData` include an `id` and semantic fields: `title`, `terms`, `content`.
+The `id` is a reference ID for an item within a specific response. It's not the document key in the search index. It's used for providing citations.
+
The purpose of this array is to provide a chat message style structure for easy integration. For example, if you want to serialize the results into a different structure or you require some programmatic manipulation of the data before you returned it to the user.
You can also get the structured data from the source data object in the references array to manipulate it however you see fit.
@@ -187,39 +222,23 @@ Here's an example of the references array.
"type": "AzureSearchDoc",
"id": "0",
"activitySource": 2,
- "docKey": "2",
- "sourceData": {
- "id": "2",
- "parent": {
- "title": null,
- "content": "good by cruel world"
- }
- }
+ "docKey": "earth_at_night_508_page_104_verbalized",
+ "sourceData": null
},
{
"type": "AzureSearchDoc",
"id": "1",
"activitySource": 2,
- "docKey": "4",
- "sourceData": {
- "id": "4",
- "parent": {
- "title": "zzzzzzz",
- "content": "zzzzzzz"
- }
- }
+ "docKey": "earth_at_night_508_page_105_verbalized",
+ "sourceData": null
}
]
```
-<!-- Create H2s for the main patterns. -->
-<!-- This section is in progress. It needs a code sample for the simple case showing how to pipeline ground data to chat completions and responses -->
-## Use data for grounding
-
-The `includeReferenceSourceData` parameter tells the search engine to provide grounding data to the knowledge agent.
-
## Related content
+ [Agentic retrieval in Azure AI Search](search-agentic-retrieval-concept.md)
++ [Agentic RAG: build a reasoning retrieval engine with Azure AI Search (YouTube video)](https://www.youtube.com/watch?v=PeTmOidqHM8)
+
+ [Azure OpenAI Demo featuring agentic retrieval](https://github.com/Azure-Samples/azure-search-openai-demo)