View Diff on GitHub
# Highlights
今回のドキュメント更新には、マイナーな修正が多数含まれています。新しい機能や大きな破壊的変更はありませんが、構文やスタイルの統一、表現の改善が行われています。修正はユーザーへより一貫した情報提供を行うことを目的としており、特に表現の明確化と一貫性の向上が図られています。
New features
- 更新により、新たに「Markdown」形式がBlobデータソースに追加されています。
Breaking changes
Other updates
- リファレンスやスタイルの統一、文言の明確化が行われ、全体的な可読性が向上しています。
- 日付などの情報が最新の内容に更新されています。
Insights
このドキュメントの更新は主にスタイルの一貫性と表現の統一に焦点を当てています。技術用語や手順の説明がより明確になり、ユーザーに対する情報伝達が改善されています。特に、リファレンスやナビゲーションの改善により、関連情報へのアクセスが容易になりました。
また、Blobデータソースに「Markdown」形式が追加されたことは、資料の拡充と選択肢の増加を示しています。これは、ユーザーが多様なデータフォーマットで作業する際の利便性を意識したものです。
全体として、これらの更新は、ドキュメントがよりプロフェッショナルに見え、利用者が必要な情報に迅速にアクセスできるよう配慮されたものとなっており、今後のメンテナンスにも寄与するものと見受けられます。
Summary Table
Modified Contents
articles/search/hybrid-search-ranking.md
Diff
@@ -1,31 +1,31 @@
---
-title: Hybrid search scoring (RRF)
+title: Hybrid Search Scoring (RRF)
titleSuffix: Azure AI Search
-description: Describes the Reciprocal Rank Fusion (RRF) algorithm used to unify search scores from parallel queries in Azure AI Search.
+description: Learn about the Reciprocal Rank Fusion (RRF) algorithm used to unify search scores from parallel queries in Azure AI Search.
author: yahnoosh
ms.author: jlembicz
ms.service: azure-ai-search
ms.custom:
- ignite-2023
ms.topic: conceptual
-ms.date: 09/28/2025
+ms.date: 01/21/2026
---
# Relevance scoring in hybrid search using Reciprocal Rank Fusion (RRF)
Reciprocal Rank Fusion (RRF) is an algorithm that evaluates the search scores from multiple, previously ranked results to produce a unified result set. In Azure AI Search, RRF is used when two or more queries execute in parallel. Namely, for [hybrid queries](hybrid-search-overview.md) and for [multiple vector queries](vector-search-overview.md). Each individual query produces a ranked result set, and RRF merges and homogenizes the rankings into a single result set for the query response.
-RRF is based on the concept of *reciprocal rank*, which is the inverse of the rank of the first relevant document in a list of search results. The goal of the technique is to take into account the position of the items in the original rankings, and give higher importance to items that are ranked higher in multiple lists. This can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results.
+RRF is based on the concept of *reciprocal rank*, which is the inverse of the rank of the first relevant document in a list of search results. The goal of the technique is to take into account the position of the items in the original rankings and give higher importance to items that are ranked higher in multiple lists. This approach can help improve the overall quality and reliability of the final ranking, making it more useful for the task of fusing multiple ordered search results.
## How RRF ranking works
RRF works by taking the search results from multiple methods, assigning a reciprocal rank score to each document in the results, and then combining the scores to create a new ranking. The concept is that documents appearing in the top positions across multiple search methods are likely to be more relevant and should be ranked higher in the combined result.
Here's a simple explanation of the RRF process:
-1. Obtain ranked search results from multiple queries executing in parallel.
+1. Get ranked search results from multiple queries running in parallel.
-1. Assign reciprocal rank scores for result in each of the ranked lists. RRF generates a new **`@search.score`** for each match in each result set. For each document in the search results, the engine assigns a reciprocal rank score based on its position in the list. The score is calculated as `1/(rank + k)`, where `rank` is the position of the document in the list, and `k` is a constant, which was experimentally observed to perform best if it's set to a small value like 60. **Note that this `k` value is a constant in the RRF algorithm and entirely separate from the `k` that controls the number of nearest neighbors.**
+1. Assign reciprocal rank scores for results in each of the ranked lists. RRF generates a new `@search.score` for each match in each result set. For each document in the search results, the engine assigns a reciprocal rank score based on its position in the list. The score is calculated as `1/(rank + k)`, where `rank` is the position of the document in the list and `k` is a constant. Experiments show the algorithm performs best when you set `k` to a small value, such as 60. **Note that this `k` value is a constant in the RRF algorithm and entirely separate from the `k` that controls the number of nearest neighbors.**
1. Combine scores. For each document, the engine sums the reciprocal rank scores obtained from each search system, producing a combined score for each document.
@@ -37,21 +37,21 @@ Only fields marked as `searchable` in the index, or `searchFields` in the query,
RRF is used anytime there's more than one query execution. The following examples illustrate query patterns where parallel query execution occurs:
-+ A full text query, plus one vector query (simple hybrid scenario), equals two query executions.
-+ A full text query, plus one vector query targeting two vector fields, equals three query executions.
-+ A full text query, plus two vector queries targeting five vector fields, equals 11 query executions
++ A full-text query, plus one vector query (simple hybrid scenario), equals two query executions.
++ A full-text query, plus one vector query targeting two vector fields, equals three query executions.
++ A full-text query, plus two vector queries targeting five vector fields, equals 11 query executions.
-## Scores in a hybrid search results
+## Scores in hybrid search results
-Whenever results are ranked, **`@search.score`** property contains the value used to order the results. Scores are generated by ranking algorithms that vary for each method. Each algorithm has its own range and magnitude.
+Whenever results are ranked, the `@search.score` property contains the value used to order the results. Scores are generated by ranking algorithms that vary for each method. Each algorithm has its own range and magnitude.
The following chart identifies the scoring property returned on each match, algorithm, and range of scores for each relevance ranking algorithm. For more information and a diagram of the scoring workflow, see [Relevance in Azure AI Search](search-relevance-overview.md).
| Search method | Parameter | Scoring algorithm | Range |
|---------------|-----------|-------------------|-------|
| full-text search | `@search.score` | BM25 algorithm | No upper limit. |
| vector search | `@search.score` | HNSW algorithm, using the similarity metric specified in the HNSW configuration. | 0.333 - 1.00 (Cosine), 0 to 1 for Euclidean and DotProduct. |
-| hybrid search | `@search.score` | RRF algorithm | Upper limit is bounded by the number of queries being fused, with each query contributing a maximum of approximately `1/k` to the RRF score (this is the `k` parameter in the RRF algorithm, not the vector query). For example, merging three queries would produce higher RRF scores than if only two search results are merged. |
+| hybrid search | `@search.score` | RRF algorithm | Upper limit is bounded by the number of queries being fused, with each query contributing a maximum of approximately `1/k` to the RRF score (this is the `k` parameter in the RRF algorithm, not the vector query). For example, merging three queries produces higher RRF scores than if only two search results are merged. |
| semantic ranking | `@search.rerankerScore` | Semantic ranking | 0.00 - 4.00 |
Semantic ranking occurs after RRF merging of results. Its score (`@search.rerankerScore`) is always reported separately in the query response. Semantic ranker can rerank full text and hybrid search results, assuming those results include fields having semantically rich content. It can rerank pure vector queries if the search documents include text fields that contain semantically relevant content.
@@ -123,19 +123,19 @@ The document's position in each result set corresponds to an initial score, whic
If you add vector weighting, the initial scores are subject to a weighting multiplier that increases or decreases the score. The default is 1.0, which means no weighting and the initial score is used as-is in RRF scoring. However, if you add a weight of 0.5, the score is reduced and that result becomes less important in the combined ranking. Conversely, if you add a weight of 2.0, the score becomes a larger factor in the overall RRF score.
-In this example, the @search.score (weighted) values are passed to the RRF ranking model.
+In this example, the `@search.score` (weighted) values go to the RRF ranking model.
## Number of ranked results in a hybrid query response
-By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full text search, and the most similar `k` matches for vector search. In a hybrid query, `top` determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned.
+By default, if you aren't using pagination, the search engine returns the top 50 highest ranking matches for full-text search, and the most similar `k` matches for vector search. In a hybrid query, `top` determines the number of results in the response. Based on defaults, the top 50 highest ranked matches of the unified result set are returned.
Often, the search engine finds more results than `top` and `k`. To return more results, use the paging parameters `top`, `skip`, and `next`. Paging is how you determine the number of results on each logical page and navigate through the full payload. You can [set `maxTextRecallSize`](hybrid-search-how-to-query.md#set-maxtextrecallsize-and-countandfacetmode) to larger values (the default is 1,000) to return more results from the text side of hybrid query.
-By default, full text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more.
+By default, full-text search is subject to a maximum limit of 1,000 matches (see [API response limits](search-limits-quotas-capacity.md#api-response-limits)). Once 1,000 matches are found, the search engine no longer looks for more.
For more information, see [How to work with search results](search-pagination-page-layout.md).
-## See also
+## Related content
+ [Hybrid search](hybrid-search-overview.md)
+ [Vector search](vector-search-overview.md)
Summary
{
"modification_type": "minor update",
"modification_title": "ハイブリッド検索ランキングに関する記事のマイナー更新"
}
Explanation
この変更は、Microsoft Azure AIのハイブリッド検索ランキングに関するドキュメントに対するマイナーな更新です。主に、関数名や文の表現が改善され、情報がより明確に伝わるように調整されています。以下に具体的な変更点を示します。
- ドキュメントのタイトルが「Hybrid search scoring (RRF)」から「Hybrid Search Scoring (RRF)」に変更され、見出しのスタイルが統一されました。
- 説明文も一部変更され、「Describes the Reciprocal Rank Fusion (RRF) algorithm…」から「Learn about the Reciprocal Rank Fusion (RRF) algorithm…」に修正され、読者に対するアプローチが変わりました。
- 日付も「09/28/2025」から「01/21/2026」に変更されており、これは文書の新しさを反映しています。
- 箇条書きの一部の表現が「Obtain ranked search results from multiple queries executing in parallel.」から「Get ranked search results from multiple queries running in parallel.」に変更され、言い回しが簡素化されています。
- 各種説明文内の「@search.score」というプロパティが、フォーマットの一貫性を持たせるために二重引用符ではなくバッククオートで囲まれるように修正されました。
全体として、構文の整合性が強化され、情報の可読性が向上しています。これにより、ドキュメント利用者はハイブリッド検索ランキングの概念についてより理解しやすくなっています。
articles/search/includes/quickstarts/agentic-retrieval-csharp.md
Diff
@@ -591,7 +591,7 @@ await indexClient.CreateOrUpdateIndexAsync(index);
Console.WriteLine($"Index '{indexName}' created or updated successfully.");
```
-Reference: [SearchField](/dotnet/api/azure.search.documents.indexes.models.searchfield), [SimpleField](/dotnet/api/azure.search.documents.indexes.models.simplefield), [VectorSearch](/dotnet/api/azure.search.documents.indexes.models.vectorsearch), [SemanticSearch](/dotnet/api/azure.search.documents.indexes.models.semanticsearch), [SearchIndex](/dotnet/api/azure.search.documents.indexes.models.searchindex), [SearchIndexClient](/dotnet/api/azure.search.documents.indexes.searchindexclient)
+**Reference:** [SearchField](/dotnet/api/azure.search.documents.indexes.models.searchfield), [SimpleField](/dotnet/api/azure.search.documents.indexes.models.simplefield), [VectorSearch](/dotnet/api/azure.search.documents.indexes.models.vectorsearch), [SemanticSearch](/dotnet/api/azure.search.documents.indexes.models.semanticsearch), [SearchIndex](/dotnet/api/azure.search.documents.indexes.models.searchindex), [SearchIndexClient](/dotnet/api/azure.search.documents.indexes.searchindexclient)
### Upload documents to the index
@@ -619,7 +619,7 @@ await searchIndexingBufferedSender.FlushAsync();
Console.WriteLine($"Documents uploaded to index '{indexName}' successfully.");
```
-Reference: [SearchClient](/dotnet/api/azure.search.documents.searchclient), [SearchIndexingBufferedSender](/dotnet/api/azure.search.documents.searchindexingbufferedsender-1)
+**Reference:** [SearchClient](/dotnet/api/azure.search.documents.searchclient), [SearchIndexingBufferedSender](/dotnet/api/azure.search.documents.searchindexingbufferedsender-1)
### Create a knowledge source
@@ -641,7 +641,7 @@ await indexClient.CreateOrUpdateKnowledgeSourceAsync(indexKnowledgeSource);
Console.WriteLine($"Knowledge source '{knowledgeSourceName}' created or updated successfully.");
```
-Reference: [SearchIndexKnowledgeSource](/dotnet/api/azure.search.documents.indexes.models.searchindexknowledgesource)
+**Reference:** [SearchIndexKnowledgeSource](/dotnet/api/azure.search.documents.indexes.models.searchindexknowledgesource)
### Create a knowledge base
@@ -675,7 +675,7 @@ await indexClient.CreateOrUpdateKnowledgeBaseAsync(knowledgeBase);
Console.WriteLine($"Knowledge base '{knowledgeBaseName}' created or updated successfully.");
```
-Reference: [KnowledgeBaseAzureOpenAIModel](/dotnet/api/azure.search.documents.indexes.models.knowledgebaseazureopenaimodel), [KnowledgeBase](/dotnet/api/azure.search.documents.indexes.models.knowledgebase)
+**Reference:** [KnowledgeBaseAzureOpenAIModel](/dotnet/api/azure.search.documents.indexes.models.knowledgebaseazureopenaimodel), [KnowledgeBase](/dotnet/api/azure.search.documents.indexes.models.knowledgebase)
### Set up messages
@@ -738,7 +738,7 @@ messages.Add(new Dictionary<string, string>
});
```
-Reference: [KnowledgeBaseRetrievalClient](/dotnet/api/azure.search.documents.knowledgebases.knowledgebaseretrievalclient?view=azure-dotnet-preview&preserve-view=true), [KnowledgeBaseRetrievalRequest](/dotnet/api/azure.search.documents.knowledgebases.models.knowledgebaseretrievalrequest?view=azure-dotnet-preview&preserve-view=true)
+**Reference:** [KnowledgeBaseRetrievalClient](/dotnet/api/azure.search.documents.knowledgebases.knowledgebaseretrievalclient?view=azure-dotnet-preview&preserve-view=true), [KnowledgeBaseRetrievalRequest](/dotnet/api/azure.search.documents.knowledgebases.models.knowledgebaseretrievalrequest?view=azure-dotnet-preview&preserve-view=true)
#### Review the response, activity, and references
Summary
{
"modification_type": "minor update",
"modification_title": "C#エージェンティック検索のクイックスタート文書のマイナー更新"
}
Explanation
この変更は、C#言語におけるエージェンティック検索のクイックスタートガイドに対するマイナーな更新です。主に、各セクションのリファレンスリンクの表記スタイルが統一され、文書の一貫性が向上しています。以下に主要な変更内容を示します。
- 各リファレンスセクションの冒頭に「Reference:」を「Reference:」と変更することにより、強調されて目立つようになりました。
- リファレンスに関連するリンクの形式は変更されず、その内容は文書において一貫して保たれています。
- 各段落の記述に関する内容は変更されていませんが、リファレンスの表記を統一することで、読者が関連情報を簡単に見つけられるようになっています。
これらの修正は、特に文書の可読性と視認性を向上させるためのものであり、利用者にとってより使いやすい情報源となるよう配慮されています。
articles/search/includes/quickstarts/agentic-retrieval-javascript.md
Diff
@@ -666,7 +666,7 @@ const searchClient = new SearchClient(process.env.AZURE_SEARCH_ENDPOINT, 'earth_
await searchIndexClient.createOrUpdateIndex(index);
```
-Reference: [SearchField](/javascript/api/@azure/search-documents/searchfield), [VectorSearch](/javascript/api/@azure/search-documents/vectorsearch), [SemanticSearch](/javascript/api/@azure/search-documents/semanticsearch), [SearchIndex](/javascript/api/@azure/search-documents/searchindex), [SearchIndexClient](/javascript/api/@azure/search-documents/searchindexclient), [SearchClient](/javascript/api/@azure/search-documents/searchclient), [DefaultAzureCredential](/javascript/api/@azure/identity/defaultazurecredential)
+**Reference:** [SearchField](/javascript/api/@azure/search-documents/searchfield), [VectorSearch](/javascript/api/@azure/search-documents/vectorsearch), [SemanticSearch](/javascript/api/@azure/search-documents/semanticsearch), [SearchIndex](/javascript/api/@azure/search-documents/searchindex), [SearchIndexClient](/javascript/api/@azure/search-documents/searchindexclient), [SearchClient](/javascript/api/@azure/search-documents/searchclient), [DefaultAzureCredential](/javascript/api/@azure/identity/defaultazurecredential)
### Upload documents to the index
@@ -708,7 +708,7 @@ while (count !== documents.length) {
console.log(`✓ All ${documents.length} documents indexed successfully!`);
```
-Reference: [SearchIndexingBufferedSender](/javascript/api/@azure/search-documents/searchindexingbufferedsender)
+**Reference:** [SearchIndexingBufferedSender](/javascript/api/@azure/search-documents/searchindexingbufferedsender)
### Create a knowledge source
@@ -733,7 +733,7 @@ await searchIndexClient.createKnowledgeSource({
console.log(`✅ Knowledge source 'earth-knowledge-source' created successfully.`);
```
-Reference: [SearchIndexKnowledgeSource](/javascript/api/@azure/search-documents/searchindexknowledgesource)
+**Reference:** [SearchIndexKnowledgeSource](/javascript/api/@azure/search-documents/searchindexknowledgesource)
### Create a knowledge base
@@ -766,7 +766,7 @@ await searchIndexClient.createKnowledgeBase({
console.log(`✅ Knowledge base 'earth-knowledge-base' created successfully.`);
```
-Reference: [KnowledgeBase](/javascript/api/@azure/search-documents/knowledgebase)
+**Reference:** [KnowledgeBase](/javascript/api/@azure/search-documents/knowledgebase)
### Run the retrieval pipeline
@@ -816,7 +816,7 @@ const retrievalRequest = {
const result = await knowledgeRetrievalClient.retrieveKnowledge(retrievalRequest);
```
-Reference: [KnowledgeRetrievalClient](/javascript/api/@azure/search-documents/knowledgeretrievalclient), [KnowledgeBaseRetrievalRequest](/javascript/api/@azure/search-documents/knowledgebaseretrievalrequest)
+**Reference:** [KnowledgeRetrievalClient](/javascript/api/@azure/search-documents/knowledgeretrievalclient), [KnowledgeBaseRetrievalRequest](/javascript/api/@azure/search-documents/knowledgebaseretrievalrequest)
### Review the response, activity, and references
Summary
{
"modification_type": "minor update",
"modification_title": "JavaScriptエージェンティック検索のクイックスタート文書のマイナー更新"
}
Explanation
この変更は、JavaScriptを使用したエージェンティック検索のクイックスタートガイドに対するマイナーな更新です。この更新では、リファレンスセクションのスタイルが統一され、全体的な可読性が向上しています。以下に重要な変更点を示します。
- 各リファレンス行の冒頭に「Reference:」を「Reference:」という形式に変更し、リファレンスを強調させました。これにより、ユーザーが関連リンクを見つけやすくなっています。
- リファレンスの内容自体には変更はなく、元のリンク先が維持されています。
- 各セクションにおけるリファレンスの表記スタイルを統一することで、文書全体の一貫性が向上しました。
これらの修正により、文書はよりプロフェッショナルに見え、利用者が必要な情報に迅速にアクセスできるようになっています。全体として、これは文書の可読性と使いやすさを向上させるための細かいが重要な改善です。
articles/search/includes/quickstarts/agentic-retrieval-python.md
Diff
@@ -514,7 +514,7 @@ index_client.create_or_update_index(index)
print(f"Index '{index_name}' created or updated successfully.")
```
-Reference: [SearchField](/python/api/azure-search-documents/azure.search.documents.indexes.models.searchfield), [VectorSearch](/python/api/azure-search-documents/azure.search.documents.indexes.models.vectorsearch), [SemanticSearch](/python/api/azure-search-documents/azure.search.documents.indexes.models.semanticsearch), [SearchIndex](/python/api/azure-search-documents/azure.search.documents.indexes.models.searchindex), [SearchIndexClient](/python/api/azure-search-documents/azure.search.documents.indexes.searchindexclient)
+**Reference:** [SearchField](/python/api/azure-search-documents/azure.search.documents.indexes.models.searchfield), [VectorSearch](/python/api/azure-search-documents/azure.search.documents.indexes.models.vectorsearch), [SemanticSearch](/python/api/azure-search-documents/azure.search.documents.indexes.models.semanticsearch), [SearchIndex](/python/api/azure-search-documents/azure.search.documents.indexes.models.searchindex), [SearchIndexClient](/python/api/azure-search-documents/azure.search.documents.indexes.searchindexclient)
### Upload documents to the index
@@ -531,7 +531,7 @@ with SearchIndexingBufferedSender(endpoint=search_endpoint, index_name=index_nam
print(f"Documents uploaded to index '{index_name}' successfully.")
```
-Reference: [SearchIndexingBufferedSender](/python/api/azure-search-documents/azure.search.documents.searchindexingbufferedsender)
+**Reference:** [SearchIndexingBufferedSender](/python/api/azure-search-documents/azure.search.documents.searchindexingbufferedsender)
### Create a knowledge source
@@ -555,7 +555,7 @@ index_client.create_or_update_knowledge_source(knowledge_source=ks)
print(f"Knowledge source '{knowledge_source_name}' created or updated successfully.")
```
-Reference: [SearchIndexKnowledgeSource](/python/api/azure-search-documents/azure.search.documents.indexes.models.searchindexknowledgesource)
+**Reference:** [SearchIndexKnowledgeSource](/python/api/azure-search-documents/azure.search.documents.indexes.models.searchindexknowledgesource)
### Create a knowledge base
@@ -588,7 +588,7 @@ index_client.create_or_update_knowledge_base(knowledge_base)
print(f"Knowledge base '{knowledge_base_name}' created or updated successfully.")
```
-Reference: [KnowledgeBase](/python/api/azure-search-documents/azure.search.documents.indexes.models.knowledgebase)
+**Reference:** [KnowledgeBase](/python/api/azure-search-documents/azure.search.documents.indexes.models.knowledgebase)
### Set up messages
@@ -657,7 +657,7 @@ result = agent_client.retrieve(retrieval_request=req)
print(f"Retrieved content from '{knowledge_base_name}' successfully.")
```
-Reference: [KnowledgeBaseRetrievalClient](/python/api/azure-search-documents/azure.search.documents.knowledgebases.knowledgebaseretrievalclient), [KnowledgeBaseRetrievalRequest](/python/api/azure-search-documents/azure.search.documents.knowledgebases.models.knowledgebaseretrievalrequest)
+**Reference:** [KnowledgeBaseRetrievalClient](/python/api/azure-search-documents/azure.search.documents.knowledgebases.knowledgebaseretrievalclient), [KnowledgeBaseRetrievalRequest](/python/api/azure-search-documents/azure.search.documents.knowledgebases.models.knowledgebaseretrievalrequest)
#### Review the response, activity, and references
Summary
{
"modification_type": "minor update",
"modification_title": "Pythonエージェンティック検索のクイックスタート文書のマイナー更新"
}
Explanation
この変更は、Pythonを使用したエージェンティック検索のクイックスタートガイドに対するマイナーな更新を反映しています。主な改訂内容は、リファレンスの表示形式の統一と整頓です。以下に具体的な変更点を示します。
- 各リファレンスの前に「Reference:」を「Reference:」に変更し、リファレンスの見た目を強調しました。この変更により、関連リンクが明確に区別され、ユーザーが情報をすぐに見つけやすくなっています。
- リファレンスに記載されているリンクは変更されず、元のリンク先がそのまま保たれています。
- 各セクションのリファレンスフォーマットを統一したことで、文書全体の一貫性が向上し、全体の可読性も改善されました。
これらの修正は、文書をよりプロフェッショナルに魅せ、ユーザーが必要な情報に素早くアクセスできるようサポートするためのものです。全体として、これは文書の可読性と使いやすさを向上させるための重要な改善といえます。
articles/search/includes/quickstarts/agentic-retrieval-rest.md
Diff
@@ -469,7 +469,7 @@ Authorization: Bearer {{token}}
}
```
-Reference: [Indexes - Create](/rest/api/searchservice/indexes/create)
+**Reference:** [Indexes - Create](/rest/api/searchservice/indexes/create)
### Upload documents to the index
@@ -505,7 +505,7 @@ Authorization: Bearer {{token}}
}
```
-Reference: [Documents - Index](/rest/api/searchservice/documents/index)
+**Reference:** [Documents - Index](/rest/api/searchservice/documents/index)
### Create a knowledge source
@@ -534,7 +534,7 @@ Authorization: Bearer {{token}}
}
```
-Reference: [Knowledge Sources - Create](/rest/api/searchservice/knowledge-sources/create?view=rest-searchservice-2025-11-01-preview&preserve-view=true)
+**Reference:** [Knowledge Sources - Create](/rest/api/searchservice/knowledge-sources/create?view=rest-searchservice-2025-11-01-preview&preserve-view=true)
### Create a knowledge base
@@ -570,7 +570,7 @@ Authorization: Bearer {{token}}
}
```
-Reference: [Knowledge Bases - Create](/rest/api/searchservice/knowledge-bases/create?view=rest-searchservice-2025-11-01-preview&preserve-view=true)
+**Reference:** [Knowledge Bases - Create](/rest/api/searchservice/knowledge-bases/create?view=rest-searchservice-2025-11-01-preview&preserve-view=true)
### Run the retrieval pipeline
@@ -615,7 +615,7 @@ Authorization: Bearer {{token}}
}
```
-Reference: [Knowledge Retrieval - Retrieve](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-11-01-preview&preserve-view=true)
+**Reference:** [Knowledge Retrieval - Retrieve](/rest/api/searchservice/knowledge-retrieval/retrieve?view=rest-searchservice-2025-11-01-preview&preserve-view=true)
The output should contain the following components:
Summary
{
"modification_type": "minor update",
"modification_title": "REST APIによるエージェンティック検索のクイックスタート文書のマイナー更新"
}
Explanation
この変更は、REST APIを使用したエージェンティック検索のクイックスタートガイドに対するマイナーな更新であり、リファレンスの表示形式に関する修正が含まれています。具体的な変更点は以下の通りです。
- 各リファレンスラインの先頭にある「Reference:」を「Reference:」に変更し、リファレンスを強調しました。これにより、ユーザーが各リンクをすぐに識別できるようになっています。
- リファレンスの内容そのものは変更されず、元のリンクは保持されています。
- リファレンスのスタイルを統一することによって、文書全体の一貫性が向上し、全体的な可読性も増しました。
これらの修正により、文書はより整然として読みやすくなり、ユーザーが必要な情報に迅速にアクセスできるようになります。また、これらの改訂は文書のプロフェッショナリズムを高め、利用者にとっての使いやすさを向上させることを目的としています。全体として、これは文書の品質を向上させるための重要な改善です。
articles/search/includes/quickstarts/agentic-retrieval-typescript.md
Diff
@@ -725,7 +725,7 @@ const searchClient = new SearchClient<EarthAtNightDocument>(process.env.AZURE_SE
await searchIndexClient.createOrUpdateIndex(index);
```
-Reference: [SearchField](/javascript/api/@azure/search-documents/searchfield), [VectorSearch](/javascript/api/@azure/search-documents/vectorsearch), [SemanticSearch](/javascript/api/@azure/search-documents/semanticsearch), [SearchIndex](/javascript/api/@azure/search-documents/searchindex), [SearchIndexClient](/javascript/api/@azure/search-documents/searchindexclient), [SearchClient](/javascript/api/@azure/search-documents/searchclient), [DefaultAzureCredential](/javascript/api/@azure/identity/defaultazurecredential)
+**Reference:** [SearchField](/javascript/api/@azure/search-documents/searchfield), [VectorSearch](/javascript/api/@azure/search-documents/vectorsearch), [SemanticSearch](/javascript/api/@azure/search-documents/semanticsearch), [SearchIndex](/javascript/api/@azure/search-documents/searchindex), [SearchIndexClient](/javascript/api/@azure/search-documents/searchindexclient), [SearchClient](/javascript/api/@azure/search-documents/searchclient), [DefaultAzureCredential](/javascript/api/@azure/identity/defaultazurecredential)
### Upload documents to the index
@@ -767,7 +767,7 @@ while (count !== documents.length) {
console.log(`✓ All ${documents.length} documents indexed successfully!`);
```
-Reference: [SearchIndexingBufferedSender](/javascript/api/@azure/search-documents/searchindexingbufferedsender)
+**Reference:** [SearchIndexingBufferedSender](/javascript/api/@azure/search-documents/searchindexingbufferedsender)
### Create a knowledge source
@@ -792,7 +792,7 @@ await searchIndexClient.createKnowledgeSource({
console.log(`✅ Knowledge source 'earth-knowledge-source' created successfully.`);
```
-Reference: [SearchIndexKnowledgeSource](/javascript/api/@azure/search-documents/searchindexknowledgesource)
+**Reference:** [SearchIndexKnowledgeSource](/javascript/api/@azure/search-documents/searchindexknowledgesource)
### Create a knowledge base
@@ -825,7 +825,7 @@ await searchIndexClient.createKnowledgeBase({
console.log(`✅ Knowledge base 'earth-knowledge-base' created successfully.`);
```
-Reference: [KnowledgeBase](/javascript/api/@azure/search-documents/knowledgebase)
+**Reference:** [KnowledgeBase](/javascript/api/@azure/search-documents/knowledgebase)
### Run the retrieval pipeline
@@ -875,7 +875,7 @@ const retrievalRequest = {
const result = await knowledgeRetrievalClient.retrieveKnowledge(retrievalRequest);
```
-Reference: [KnowledgeRetrievalClient](/javascript/api/@azure/search-documents/knowledgeretrievalclient), [KnowledgeBaseRetrievalRequest](/javascript/api/@azure/search-documents/knowledgebaseretrievalrequest)
+**Reference:** [KnowledgeRetrievalClient](/javascript/api/@azure/search-documents/knowledgeretrievalclient), [KnowledgeBaseRetrievalRequest](/javascript/api/@azure/search-documents/knowledgebaseretrievalrequest)
### Review the response, activity, and references
Summary
{
"modification_type": "minor update",
"modification_title": "TypeScriptによるエージェンティック検索のクイックスタート文書のマイナー更新"
}
Explanation
この変更は、TypeScriptを使用したエージェンティック検索のクイックスタートガイドに対するマイナーな更新を示しています。変更の主な内容は、リファレンスの表示形式の改善です。具体的には以下の点が修正されました。
- 各リファレンスの表示を「Reference:」から「Reference:」に変更し、リファレンスの識別性を向上させました。これにより、文書を読むユーザーがリファレンスセクションを一目で見つけやすくなります。
- リファレンスの内容は変更されず、リンク先はそのまま保持されています。
- 各リファレンスがスタイル的に統一されることで、文書全体の一貫性が向上し、全体的な可読性が増しました。
これらの改訂は、文書のプロフェッショナリズムを高め、利用者が必要な情報に迅速にアクセスできるようにすることを目的としています。全体として、この変更は文書の品質向上に寄与する重要な改善であるといえます。
articles/search/includes/quickstarts/search-get-started-vector-rest.md
Diff
@@ -950,7 +950,7 @@ To create a hybrid search:
1. Select **Send Request**. You should have an `HTTP/1.1 200 OK` response. The response body should include the JSON representation of the search results.
- Because this is a hybrid query, results are [ranked by Reciprocal Rank Fusion (RRF)](../../hybrid-search-ranking.md#scores-in-a-hybrid-search-results). Notice that `@search.score` values have a different basis and are uniformly smaller values. RRF evaluates search scores of multiple search results, takes the inverse, and then merges and sorts the combined results. The `top` number of results are returned.
+ Because this is a hybrid query, results are [ranked by Reciprocal Rank Fusion (RRF)](../../hybrid-search-ranking.md#scores-in-hybrid-search-results). Notice that `@search.score` values have a different basis and are uniformly smaller values. RRF evaluates search scores of multiple search results, takes the inverse, and then merges and sorts the combined results. The `top` number of results are returned.
Review the response, consisting of the top k-5 matches out of 7 matching documents in the index:
Summary
{
"modification_type": "minor update",
"modification_title": "ベクトル検索のクイックスタート文書の文言修正"
}
Explanation
この変更は、REST APIによるベクトル検索のクイックスタートガイドに対する小さな修正を示しています。具体的には、ハイブリッド検索の説明における一文の表現が修正されました。
変更前の文では、結果が「ハイブリッド検索結果における逆数計算によってランク付けされる」と表記されていましたが、変更後は「ハイブリッド検索結果における逆数計算によってランク付けされる」と修正されています。この変更は、表現をより正確にし、一貫性を持たせるためのものです。
其他の部分は変更されておらず、説明自体の内容はそのまま保持されています。
このような微調整により、文書の正確性が向上し、利用者に誤解を与えないよう配慮されています。全体として、この変更は文書の質を高める重要なエフォートとなっています。
articles/search/includes/search-blob-data-sources.md
Diff
@@ -2,16 +2,18 @@
author: mgottein
ms.service: azure-ai-search
ms.topic: include
-ms.date: 05/02/2019
+ms.date: 01/16/2026
ms.author: magottei
---
+
* CSV (see [Indexing CSV blobs](../search-how-to-index-azure-blob-csv.md))
* EML
* EPUB
* GZ
* HTML
* JSON (see [Indexing JSON blobs](../search-how-to-index-azure-blob-json.md))
* KML (XML for geographic representations)
+* Markdown
* Microsoft Office formats: DOCX/DOC/DOCM, XLSX/XLS/XLSM, PPTX/PPT/PPTM, MSG (Outlook emails), XML (both 2003 and 2006 WORD XML)
* Open Document formats: ODT, ODS, ODP
* PDF
Summary
{
"modification_type": "minor update",
"modification_title": "Blobデータソースに関する文書の更新"
}
Explanation
この変更は、Blobデータソースに関する文書のマイナーな更新を示しています。主な修正点は次の通りです。
日付の更新: ms.dateの値が「2019年5月2日」から「2026年1月16日に」変更されました。これにより、文書が最新の状態に更新されたことが示されます。
リストへの新規項目の追加: Blobデータソースの種類に「Markdown」が新たに追加されました。この変更により、Blobストレージに対応するデータフォーマットの選択肢が拡充されました。
これらの変更は、文書の正確性と関連性を高め、ユーザーに対して最新の情報を提供することを目的としています。また、Blobストレージのサポートするフォーマットの包括的なリストを提供することで、利用者が適切なデータフォーマットを選択できるよう配慮されています。全体として、このマイナーな更新は文書の品質を向上させる重要な改善です。
articles/search/media/search-get-started-rest/get-endpoint.png
Summary
{
"modification_type": "minor update",
"modification_title": "画像ファイルの差分なし"
}
Explanation
この変更は、get-endpoint.pngという画像ファイルに対して行われたもので、実際には内容の変更がありません。具体的には、追加や削除がなく、ファイルの内容に対する変更もないため、ファイル自体に新しい情報は含まれていません。このような場合、画像ファイルの更新は、例えばメタデータやバージョン管理上の記録のために行われることがありますが、実際の画像内容には影響がありません。
この変更は、文書に関連するビジュアルコンテンツの管理において、将来的な更新や修正の際に役立つ情報を維持するためのものであると考えられます。したがって、この差分はユーザーにとって特に影響がないことが示されています。
articles/search/media/service-configure-firewall/azure-portal-firewall-all.png
Summary
{
"modification_type": "minor update",
"modification_title": "画像ファイルの差分なし"
}
Explanation
この変更は、azure-portal-firewall-all.pngという画像ファイルに関するもので、実際には内容に対する変更はありません。このファイルは、追加や削除がなく、またその内容にも変更がないため、ユーザーにとって特に新しい情報は提供されていません。
このような更新は、ファイルの管理や追跡の目的で行われることがあります。たとえば、組織内でのバージョン管理や情報の整理などが考えられますが、こちらの場合は実際のビジュアルコンテンツに対する影響はありません。
したがって、この差分は特に重要な変更を示しているわけではなく、画像ファイルが引き続き利用可能であることを示すものとなっています。
articles/search/search-blob-indexer-role-based-access.md
Diff
@@ -1,5 +1,5 @@
---
-title: Use a Blob indexer or knowledge source to ingest RBAC scopes metadata
+title: Use a Blob Indexer or Knowledge Source to Ingest RBAC Scopes Metadata
titleSuffix: Azure AI Search
description: Learn how to configure Azure AI Search knowledge sources and indexers for ingesting Azure Role-Based Access (RBAC) metadata on Azure blobs.
ms.service: azure-ai-search
@@ -9,7 +9,7 @@ author: vaishalishah
ms.author: vaishalishah
---
-# Use a Blob indexer or knowledge source to ingest RBAC scopes metadata
+# Use a blob indexer or knowledge source to ingest RBAC scopes metadata
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
Summary
{
"modification_type": "minor update",
"modification_title": "タイトルのスタイル修正"
}
Explanation
この変更は、search-blob-indexer-role-based-access.mdというMarkdownファイルに対して行われたもので、主にタイトルの表記に関するスタイルの修正が含まれています。具体的には、ファイルのタイトルが次のように変更されました:
- 「Use a Blob indexer or knowledge source to ingest RBAC scopes metadata」から「Use a Blob Indexer or Knowledge Source to Ingest RBAC Scopes Metadata」というように、“Blob Indexer”や”Knowledge Source”といった単語の頭文字が大文字に変更されています。
また、ドキュメントの本文内の見出しも同様に修正されています。これにより、内容の一貫性が高まり、よりプロフェッショナルな印象を与えることが期待されます。
この変更は、全体として文書の可読性やスタイルを向上させるためのものであり、重要な機能や情報には影響を与えません。
articles/search/search-how-to-index-azure-blob-storage.md
Diff
@@ -1,13 +1,13 @@
---
-title: Azure blob indexer
+title: Azure Blob Indexer
titleSuffix: Azure AI Search
-description: Set up an Azure blob indexer to automate indexing of blob content for full text search operations and knowledge mining in Azure AI Search.
+description: Learn how to set up a blob indexer to automate indexing of Azure Blob Storage content for full-text search, knowledge mining, and other scenarios in Azure AI Search.
author: gmndrg
ms.author: gimondra
manager: vinodva
ms.service: azure-ai-search
ms.topic: how-to
-ms.date: 05/08/2025
+ms.date: 01/21/2026
ms.update-cycle: 180-days
ms.custom:
- ignite-2023
@@ -17,27 +17,23 @@ ms.custom:
# Index data from Azure Blob Storage
-In this article, learn how to configure an [**indexer**](search-indexer-overview.md) that imports content from Azure Blob Storage and makes it searchable in Azure AI Search. Inputs to the indexer are your blobs, in a single container. Output is a search index with searchable content and metadata stored in individual fields.
+In this article, you learn how to configure an [indexer](search-indexer-overview.md) that imports content from Azure Blob Storage and makes it searchable in Azure AI Search. The indexer receives blobs in a single container as input. The output is a search index that stores searchable content and metadata in individual fields.
-To configure and run the indexer, you can use:
+This article uses the [Search Service REST APIs](/rest/api/searchservice) to demonstrate how to configure and run the indexer. However, you can also use:
-+ [Search Service REST API](/rest/api/searchservice), any version.
-+ An Azure SDK package, any version.
-+ [**Import data** wizard](search-get-started-portal.md) in the Azure portal.
-+ [**Import data (new)** wizard](search-get-started-portal-import-vectors.md) in the Azure portal.
-
-This article uses the REST APIs to illustrate each step.
++ An Azure SDK package (any version)
++ [**Import data (new)** wizard](search-get-started-portal-import-vectors.md) in the Azure portal
> [!NOTE]
-> Azure AI Search can now ingest RBAC scope during indexing and transfers those permissions to indexed content in the search index. For more information about RBAC scope during indexing, see [Indexing Azure Role-Based Access Control scope using Indexers](search-blob-indexer-role-based-access.md).
+> Azure AI Search can ingest role-based access control (RBAC) scope during indexing and transfer those permissions to indexed content in a search index. For more information, see [Use a blob indexer or knowledge source to ingest RBAC scopes metadata](search-blob-indexer-role-based-access.md).
## Prerequisites
+ [Azure Blob Storage](/azure/storage/blobs/storage-blobs-overview), Standard performance (general-purpose v2).
+ [Access tiers](/azure/storage/blobs/access-tiers-overview) include hot, cool, cold, and archive. Indexers can retrieve blobs on hot, cool, and cold access tiers.
-+ Blobs providing text content and metadata. If blobs contain binary content or unstructured text, consider adding [AI enrichment](cognitive-search-concept-intro.md) for image and natural language processing. Blob content can’t exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your search service tier.
++ Blobs providing text content and metadata. If blobs contain binary content or unstructured text, consider adding [AI enrichment](cognitive-search-concept-intro.md) for image and natural language processing. Blob content can't exceed the [indexer limits](search-limits-quotas-capacity.md#indexer-limits) for your pricing tier.
+ A supported network configuration and data access. At a minimum, you need read permissions in Azure Storage. A storage connection string that includes an access key gives you read access to storage content. If instead you're using Microsoft Entra logins and roles, make sure the [search service's managed identity](search-how-to-managed-identities.md) has **Storage Blob Data Reader** permissions.
@@ -51,9 +47,9 @@ You can use this indexer for the following tasks:
+ **Data indexing and incremental indexing:** The indexer can index files and associated metadata from blob containers and folders. It detects new and updated files and metadata through built-in change detection. You can configure data refresh on a schedule or on demand.
+ **Deletion detection:** The indexer can [detect deletions through native soft delete or through custom metadata](search-how-to-index-azure-blob-changed-deleted.md).
-+ **Applied AI through skillsets:** [Skillsets](cognitive-search-concept-intro.md) are fully supported by the indexer. This includes key features like [integrated vectorization](vector-search-integrated-vectorization.md) that adds data chunking and embedding steps.
++ **Applied AI through skillsets:** The indexer fully supports [skillsets](cognitive-search-concept-intro.md). This support includes key features like [integrated vectorization](vector-search-integrated-vectorization.md), which adds data chunking and embedding.
+ **Parsing modes:** The indexer supports [JSON parsing modes](search-how-to-index-azure-blob-json.md) if you want to parse JSON arrays or lines into individual search documents. It also supports [Markdown parsing mode](search-how-to-index-azure-blob-markdown.md).
-+ **Compatibility with other features:** The indexer is designed to work seamlessly with other indexer features, such as [debug sessions](cognitive-search-debug-session.md), [indexer cache for incremental enrichments](enrichment-cache-how-to-configure.md), and [knowledge store](knowledge-store-concept-intro.md).
++ **Compatibility with other features:** The indexer works seamlessly with other indexer features, such as [debug sessions](cognitive-search-debug-session.md), [indexer cache for incremental enrichments](enrichment-cache-how-to-configure.md), and [knowledge store](knowledge-store-concept-intro.md).
<a name="SupportedFormats"></a>
@@ -65,57 +61,54 @@ The blob indexer can extract text from the following document formats:
## Determine which blobs to index
-Before you set up indexing, review your source data to determine whether any changes should be made up front. An indexer can index content from one container at a time. By default, all blobs in the container are processed. You have several options for more selective processing:
+Before you set up indexing, review your source data to determine whether you need to make any changes. An indexer can index content from one container at a time. By default, the indexer processes all blobs in the container. You have several options for more selective processing:
-+ Place blobs in a virtual folder. An indexer [data source definition](#define-the-data-source) includes a "query" parameter that can take a virtual folder. If you specify a virtual folder, only those blobs in the folder are indexed.
++ Place blobs in a virtual folder. An indexer [data source definition](#define-the-data-source) includes a `query` parameter that can take a virtual folder. If you specify a virtual folder, the indexer indexes only those blobs in the folder.
-+ Include or exclude blobs by file type. The [supported document formats list](#SupportedFormats) can help you determine which blobs to exclude. For example, you might want to exclude image or audio files that don't provide searchable text. This capability is controlled through [configuration settings](#configure-and-run-the-blob-indexer) in the indexer.
++ Include or exclude blobs by file type. The [supported document formats list](#SupportedFormats) can help you determine which blobs to exclude. For example, you might want to exclude image or audio files that don't provide searchable text. You control this capability through [configuration settings](#configure-and-run-the-blob-indexer) in the indexer.
-+ Include or exclude arbitrary blobs. If you want to skip a specific blob for whatever reason, you can add the following metadata properties and values to blobs in Blob Storage. When an indexer encounters this property, it skips the blob or its content in the indexing run.
++ Include or exclude arbitrary blobs. To skip a specific blob, add the following metadata properties and values to blobs in Azure Blob Storage. When an indexer encounters this property, it skips the blob or its content in the indexing run.
| Property name | Property value | Explanation |
| ------------- | -------------- | ----------- |
- | "AzureSearch_Skip" |`"true"` |Instructs the blob indexer to completely skip the blob. Neither metadata nor content extraction is attempted. This is useful when a particular blob fails repeatedly and interrupts the indexing process. |
- | "AzureSearch_SkipContent" |`"true"` | Skips content and extracts just the metadata. this is equivalent to the `"dataToExtract" : "allMetadata"` setting described in [configuration settings](#configure-and-run-the-blob-indexer) , just scoped to a particular blob. |
+ | `AzureSearch_Skip` | `true` | Instructs the blob indexer to completely skip the blob. The indexer doesn't attempt to extract metadata or content. This property is useful when a particular blob fails repeatedly and interrupts the indexing process. |
+ | `AzureSearch_SkipContent` | `true` | The indexer skips content and extracts just the metadata. This property is equivalent to the `"dataToExtract": "allMetadata"` setting described in [configuration settings](#configure-and-run-the-blob-indexer), but it's scoped to a particular blob. |
-If you don't set up inclusion or exclusion criteria, the indexer reports an ineligible blob as an error and move on. If enough errors occur, processing might stop. You can specify error tolerance in the indexer [configuration settings](#configure-and-run-the-blob-indexer).
+If you don't set up inclusion or exclusion criteria, the indexer reports an ineligible blob as an error and moves on. If enough errors occur, processing might stop. You can specify error tolerance in the indexer [configuration settings](#configure-and-run-the-blob-indexer).
An indexer typically creates one search document per blob, where the text content and metadata are captured as searchable fields in an index. If blobs are whole files, you can potentially parse them into [multiple search documents](search-how-to-index-azure-blob-one-to-many.md). For example, you can parse rows in a [CSV file](search-how-to-index-azure-blob-csv.md) to create one search document per row.
-A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or an .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file will be returned in the normalized_images field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
-
-Textual content of a document is extracted into a string field named "content". You can also extract standard and user-defined metadata.
+A compound or embedded document (such as a ZIP archive, a Word document with embedded Outlook email containing attachments, or an .MSG file with attachments) is also indexed as a single document. For example, all images extracted from the attachments of an .MSG file are returned in the `normalized_images` field. If you have images, consider adding [AI enrichment](cognitive-search-concept-intro.md) to get more search utility from that content.
+The indexer extracts the textual content of a document into a string field named `content`. You can also extract standard and user-defined metadata.
<a name="indexing-blob-metadata"></a>
### Indexing blob metadata
-Blob metadata can also be indexed, and that's helpful if you think any of the standard or custom metadata properties are useful in filters and queries.
+You can also index blob metadata. This feature is helpful if you think any of the standard or custom metadata properties are useful in filters and queries.
-User-specified metadata properties are extracted verbatim. To receive the values, you must define field in the search index of type `Edm.String`, with same name as the metadata key of the blob. For example, if a blob has a metadata key of `Sensitivity` with value `High`, you should define a field named `Sensitivity` in your search index and it will be populated with the value `High`.
+The indexer extracts user-specified metadata properties verbatim. To receive the values, you must define a field in the search index of type `Edm.String` with the same name as the metadata key of the blob. For example, if a blob has a metadata key of `Sensitivity` with value `High`, define a field named `Sensitivity` in your search index. The index field populates with the value `High`.
-Standard blob metadata properties can be extracted into similarly named and typed fields, as listed below. The blob indexer automatically creates internal field mappings for these blob metadata properties, converting the original hyphenated name ("metadata-storage-name") to an underscored equivalent name ("metadata_storage_name").
+You can extract standard blob metadata properties into similarly named and typed fields, as listed below. The blob indexer automatically creates internal field mappings for these blob metadata properties, converting the original hyphenated name (`metadata-storage-name`) to an underscored equivalent name (`metadata_storage_name`).
You still have to add the underscored fields to the index definition, but you can omit field mappings because the indexer makes the association automatically.
-+ **metadata_storage_name** (`Edm.String`) - the file name of the blob. For example, if you have a blob /my-container/my-folder/subfolder/resume.pdf, the value of this field is `resume.pdf`.
-
-+ **metadata_storage_path** (`Edm.String`) - the full URI of the blob, including the storage account. For example, `https://myaccount.blob.core.windows.net/my-container/my-folder/subfolder/resume.pdf`
-
-+ **metadata_storage_content_type** (`Edm.String`) - content type as specified by the code you used to upload the blob. For example, `application/octet-stream`.
++ **metadata_storage_name** (`Edm.String`) is the file name of the blob. For example, if you have a blob `/my-container/my-folder/subfolder/resume.pdf`, the value of this field is `resume.pdf`.
-+ **metadata_storage_last_modified** (`Edm.DateTimeOffset`) - last modified timestamp for the blob. Azure AI Search uses this timestamp to identify changed blobs, to avoid reindexing everything after the initial indexing.
++ **metadata_storage_path** (`Edm.String`) is the full URI of the blob, including the storage account. For example, `https://myaccount.blob.core.windows.net/my-container/my-folder/subfolder/resume.pdf`.
-+ **metadata_storage_size** (`Edm.Int64`) - blob size in bytes.
++ **metadata_storage_content_type** (`Edm.String`) is the content type as specified by the code you used to upload the blob. For example, `application/octet-stream`.
++ **metadata_storage_last_modified** (`Edm.DateTimeOffset`) is the last modified timestamp for the blob. Azure AI Search uses this timestamp to identify changed blobs, to avoid reindexing everything after the initial indexing.
-+ **metadata_storage_content_md5** (`Edm.String`) - MD5 hash of the blob content, if available.
++ **metadata_storage_size** (`Edm.Int64`) is the blob size in bytes.
-+ **metadata_storage_sas_token** (`Edm.String`) - A temporary SAS token that can be used by [custom skills](cognitive-search-custom-skill-interface.md) to get access to the blob. This token shouldn't be stored for later use as it might expire.
++ **metadata_storage_content_md5** (`Edm.String`) is the MD5 hash of the blob content, if available.
++ **metadata_storage_sas_token** (`Edm.String`) is a temporary SAS token that [custom skills](cognitive-search-custom-skill-interface.md) can use to get access to the blob. Don't store this token for later use, as it might expire.
-Lastly, any metadata properties specific to the document format of the blobs you're indexing can also be represented in the index schema. For more information about content-specific metadata, see [Content metadata properties](search-blob-metadata-properties.md).
+Lastly, you can represent metadata properties specific to the document format of the blobs you're indexing in the index schema. For more information about content-specific metadata, see [Content metadata properties](search-blob-metadata-properties.md).
-It's important to point out that you don't need to define fields for all of the above properties in your search index - just capture the properties you need for your application.
+It's important to point out that you don't need to define fields for all of the above properties in your search index. Just capture the properties you need for your application.
Currently, indexing [blob index tags](/azure/storage/blobs/storage-blob-index-how-to) isn't supported by this indexer.
@@ -134,19 +127,19 @@ The data source definition specifies the data to index, credentials, and policie
}
```
-1. Set "type" to `"azureblob"` (required).
+1. Set `type` to `azureblob` (required).
-1. Set "credentials" to an Azure Storage connection string. The next section describes the supported formats.
+1. Set `credentials` to an Azure Storage connection string. The next section describes the supported formats.
-1. Set "container" to the blob container, and use "query" to specify any subfolders.
+1. Set `container` to the blob container, and use `query` to specify any subfolders.
-A data source definition can also include [soft deletion policies](search-how-to-index-azure-blob-changed-deleted.md), if you want the indexer to delete a search document when the source document is flagged for deletion.
+You can also include [soft deletion policies](search-how-to-index-azure-blob-changed-deleted.md) in a data source definition if you want the indexer to delete a search document when the source document is flagged for deletion.
<a name="credentials"></a>
### Supported credentials and connection strings
-Indexers can connect to a blob container using the following connections.
+Indexers can connect to a blob container by using the following connections.
| Full access storage account connection string |
|-----------------------------------------------|
@@ -158,24 +151,24 @@ Indexers can connect to a blob container using the following connections.
|`{ "connectionString" : "ResourceId=/subscriptions/<your subscription ID>/resourceGroups/<your resource group name>/providers/Microsoft.Storage/storageAccounts/<your storage account name>/;" }`|
|This connection string doesn't require an account key, but you must have previously configured a search service to [connect using a managed identity](search-how-to-managed-identities.md).|
-| Storage account shared access signature** (SAS) connection string |
+| Storage account shared access signature (SAS) connection string |
|-------------------------------------------------------------------|
| `{ "connectionString" : "BlobEndpoint=https://<your account>.blob.core.windows.net/;SharedAccessSignature=?sv=2016-05-31&sig=<the signature>&spr=https&se=<the validity end time>&srt=co&ss=b&sp=rl;" }` |
| The SAS should have the list and read permissions on containers and objects (blobs in this case). |
| Container shared access signature |
|-----------------------------------|
| `{ "connectionString" : "ContainerSharedAccessUri=https://<your storage account>.blob.core.windows.net/<container name>?sv=2016-05-31&sr=c&sig=<the signature>&se=<the validity end time>&sp=rl;" }` |
-| The SAS should have the list and read permissions on the container. For more information, see [Using Shared Access Signatures](/azure/storage/common/storage-sas-overview). |
+| The SAS should have the list and read permissions on the container. For more information, see [Grant limited access to Azure Storage resources using shared access signatures (SAS)](/azure/storage/common/storage-sas-overview). |
> [!NOTE]
-> If you use SAS credentials, you will need to update the data source credentials periodically with renewed signatures to prevent their expiration. If SAS credentials expire, the indexer will fail with an error message similar to "Credentials provided in the connection string are invalid or have expired".
+> If you use SAS credentials, you need to update the data source credentials periodically with renewed signatures to prevent their expiration. If SAS credentials expire, the indexer fails with an error message similar to "Credentials provided in the connection string are invalid or have expired".
## Add search fields to an index
In a [search index](search-what-is-an-index.md), add fields to accept the content and metadata of your Azure blobs.
-1. [Create or update an index](/rest/api/searchservice/indexes/create-or-update) to define search fields that will store blob content and metadata:
+1. [Create or update an index](/rest/api/searchservice/indexes/create-or-update) to define search fields that store blob content and metadata:
```http
POST https://[service name].search.windows.net/indexes?api-version=2025-09-01
@@ -191,25 +184,25 @@ In a [search index](search-what-is-an-index.md), add fields to accept the conten
}
```
-1. Create a document key field ("key": true). For blob content, the best candidates are metadata properties.
+1. Create a document key field (`"key": true`). For blob content, the best candidates are metadata properties.
- + **`metadata_storage_path`** (default) full path to the object or file. The key field ("ID" in this example) will be populated with values from metadata_storage_path because it's the default.
+ + **`metadata_storage_path`** (default) is the full path to the object or file. The key field (`ID` in this example) is populated with values from metadata_storage_path because it's the default.
- + **`metadata_storage_name`**, usable only if names are unique. If you want this field as the key, move `"key": true` to this field definition.
+ + **`metadata_storage_name`** is usable only if names are unique. If you want this field as the key, move `"key": true` to this field definition.
- + A custom metadata property that you add to blobs. This option requires that your blob upload process adds that metadata property to all blobs. Since the key is a required property, any blobs that are missing a value will fail to be indexed. If you use a custom metadata property as a key, avoid making changes to that property. Indexers will add duplicate documents for the same blob if the key property changes.
+ + A custom metadata property that you add to blobs. This option requires that your blob upload process adds that metadata property to all blobs. Since the key is a required property, any blobs that are missing a value fail to be indexed. If you use a custom metadata property as a key, avoid making changes to that property. Indexers add duplicate documents for the same blob if the key property changes.
Metadata properties often include characters, such as `/` and `-`, which are invalid for document keys. However, the indexer automatically encodes the key metadata property, with no configuration or field mapping required.
-1. Add a "content" field to store extracted text from each file through the blob's "content" property. You aren't required to use this name, but doing so lets you take advantage of implicit field mappings.
+1. Add a `content` field to store extracted text from each file through the blob's `content` property. You aren't required to use this name, but by using it, you can take advantage of implicit field mappings.
1. Add fields for standard metadata properties. The indexer can read custom metadata properties, [standard metadata](#indexing-blob-metadata) properties, and [content-specific metadata](search-blob-metadata-properties.md) properties.
<a name="PartsOfBlobToIndex"></a>
## Configure and run the blob indexer
-Once the index and data source have been created, you're ready to create the indexer. Indexer configuration specifies the inputs, parameters, and properties controlling run time behaviors. You can also specify which parts of a blob to index.
+After you create the index and data source, create the indexer. Indexer configuration specifies the inputs, parameters, and properties that control runtime behaviors. You can also specify which parts of a blob to index.
1. [Create or update an indexer](/rest/api/searchservice/indexers/create-or-update) by giving it a name and referencing the data source and target index:
@@ -235,38 +228,38 @@ Once the index and data source have been created, you're ready to create the ind
}
```
-1. Set `batchSize` if the default (10 documents) is either underutilizing or overwhelming available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
+1. Set `batchSize` if the default (10 documents) underutilizes or overwhelms available resources. Default batch sizes are data source specific. Blob indexing sets batch size at 10 documents in recognition of the larger average document size.
-1. Under "configuration", control which blobs are indexed based on file type, or leave unspecified to retrieve all blobs.
+1. Under `configuration`, control which blobs are indexed based on file type, or leave unspecified to retrieve all blobs.
- For `"indexedFileNameExtensions"`, provide a comma-separated list of file extensions (with a leading dot). Do the same for `"excludedFileNameExtensions"` to indicate which extensions should be skipped. If the same extension is in both lists, it will be excluded from indexing.
+ For `indexedFileNameExtensions`, provide a comma-separated list of file extensions (with a leading dot). Do the same for `excludedFileNameExtensions` to indicate which extensions the indexer should skip. If the same extension is in both lists, the indexer excludes it from indexing.
-1. Under "configuration", set "dataToExtract" to control which parts of the blobs are indexed:
+1. Under `configuration`, set `dataToExtract` to control which parts of the blobs are indexed:
- + "contentAndMetadata" specifies that all metadata and textual content extracted from the blob are indexed. This is the default value.
+ + `contentAndMetadata` specifies that the indexer indexes all metadata and textual content extracted from the blob. This is the default value.
- + "storageMetadata" specifies that only the [standard blob properties and user-specified metadata](/azure/storage/blobs/storage-blob-container-properties-metadata) are indexed.
+ + `storageMetadata` specifies that the indexer indexes only the [standard blob properties and user-specified metadata](/azure/storage/blobs/storage-blob-container-properties-metadata).
- + "allMetadata" specifies that standard blob properties and any [metadata for found content types](search-blob-metadata-properties.md) are extracted from the blob content and indexed.
+ + `allMetadata` specifies that the indexer extracts from the blob content and indexes standard blob properties and any [metadata for found content types](search-blob-metadata-properties.md).
-1. Under "configuration", set "parsingMode". The default parsing mode is one search document per blob. If blobs are plain text, you can get better performance by switching to [plain text](search-how-to-index-azure-blob-plaintext.md) parsing. If you need more granular parsing that maps blobs to [multiple search documents](search-how-to-index-azure-blob-one-to-many.md), specify a different mode. One-to-many parsing is supported for blobs consisting of:
+1. Under `configuration`, set `parsingMode`. The default parsing mode is one search document per blob. If blobs are plain text, you can get better performance by switching to [plain text](search-how-to-index-azure-blob-plaintext.md) parsing. If you need more granular parsing that maps blobs to [multiple search documents](search-how-to-index-azure-blob-one-to-many.md), specify a different mode. One-to-many parsing is supported for blobs consisting of:
+ [JSON documents](search-how-to-index-azure-blob-json.md)
+ [CSV files](search-how-to-index-azure-blob-csv.md)
1. [Specify field mappings](search-indexer-field-mappings.md) if there are differences in field name or type, or if you need multiple versions of a source field in the search index.
- In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the "content" and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer will automatically replace hyphens `-` with underscores in the search index.
+ In blob indexing, you can often omit field mappings because the indexer has built-in support for mapping the `content` and metadata properties to similarly named and typed fields in an index. For metadata properties, the indexer automatically replaces hyphens `-` with underscores in the search index.
1. See [Create an indexer](search-howto-create-indexers.md) for more information about other properties. For the full list of parameter descriptions, see [REST API](/rest/api/searchservice/indexers/create).
-An indexer runs automatically when it's created. You can prevent this by setting "disabled" to true. To control indexer execution, [run an indexer on demand](search-howto-run-reset-indexers.md) or [put it on a schedule](search-howto-schedule-indexers.md).
+An indexer runs automatically when it's created. You can prevent this action by setting `disabled` to true. To control indexer execution, [run an indexer on demand](search-howto-run-reset-indexers.md) or [put it on a schedule](search-howto-schedule-indexers.md).
-## Indexing data from multiple Azure Blob containers to a single index
+## Index data from multiple Azure Blob containers to a single index
-Keep in mind that an indexer can only index data from a single container. If your requirement is to index data from multiple containers and consolidate it into a single AI Search index, this can be achieved by configuring multiple indexers, all directed to the same index. Please be aware of the [maximum number of indexers available per SKU](search-limits-quotas-capacity.md#indexer-limits).
+Remember that an indexer can only index data from a single container. If you need to index data from multiple containers and consolidate it into a single AI Search index, configure multiple indexers that all point to the same index. Be aware of the [maximum number of indexers available per SKU](search-limits-quotas-capacity.md#indexer-limits).
-To illustrate, let's consider an example of two indexers, pulling data from two distinct data sources, named `my-blob-datasource1` and `my-blob-datasource2`. Each data source points to a separate Azure Blob container, but both direct to the same index named `my-search-index`.
+For example, you can use two indexers to pull data from two distinct data sources named `my-blob-datasource1` and `my-blob-datasource2`. Each data source points to a separate Azure Blob container, but both direct to the same index named `my-search-index`.
First indexer definition example:
@@ -359,17 +352,17 @@ The response includes status and the number of items processed. It should look s
}
```
-Execution history contains up to 50 of the most recently completed executions, which are sorted in the reverse chronological order so that the latest execution comes first.
+Execution history contains up to 50 of the most recently completed executions. The entries are sorted in reverse chronological order, so the latest execution comes first.
<a name="DealingWithErrors"></a>
## Handle errors
Errors that commonly occur during indexing include unsupported content types, missing content, or oversized blobs.
-By default, the blob indexer stops as soon as it encounters a blob with an unsupported content type (for example, an audio file). You could use the "excludedFileNameExtensions" parameter to skip certain content types. However, you might want to indexing to proceed even if errors occur, and then debug individual documents later. For more information about indexer errors, see [Indexer troubleshooting guidance](search-indexer-troubleshooting.md) and [Indexer errors and warnings](cognitive-search-common-errors-warnings.md).
+By default, the blob indexer stops as soon as it encounters a blob with an unsupported content type (for example, an audio file). You can use the `excludedFileNameExtensions` parameter to skip certain content types. However, you might want indexing to proceed even if errors occur, and then debug individual documents later. For more information about indexer errors, see [Indexer troubleshooting guidance](search-indexer-troubleshooting.md) and [Indexer errors and warnings](cognitive-search-common-errors-warnings.md).
-There are five indexer properties that control the indexer's response when errors occur.
+When errors occur, five indexer parameters control the indexer's response:
```http
PUT /indexers/[indexer name]?api-version=2025-09-01
@@ -388,13 +381,13 @@ PUT /indexers/[indexer name]?api-version=2025-09-01
| Parameter | Valid values | Description |
|-----------|--------------|-------------|
-| "maxFailedItems" | -1, null or 0, positive integer | Continue indexing if errors happen at any point of processing, either while parsing blobs or while adding documents to an index. Set these properties to the number of acceptable failures. A value of `-1` allows processing no matter how many errors occur. Otherwise, the value is a positive integer. |
-| "maxFailedItemsPerBatch" | -1, null or 0, positive integer | Same as above, but used for batch indexing. |
-| "failOnUnsupportedContentType" | true or false | If the indexer is unable to determine the content type, specify whether to continue or fail the job. |
-|"failOnUnprocessableDocument" | true or false | If the indexer is unable to process a document of an otherwise supported content type, specify whether to continue or fail the job. |
-| "indexStorageMetadataOnlyForOversizedDocuments" | true or false | Oversized blobs are treated as errors by default. If you set this parameter to true, the indexer will try to index its metadata even if the content can’t be indexed. For limits on blob size, see [service Limits](search-limits-quotas-capacity.md). |
+| `maxFailedItems` | -1, null, or 0, positive integer | Continue indexing if errors happen at any point of processing, either while parsing blobs or while adding documents to an index. Set this property to the number of acceptable failures. A value of `-1` allows processing no matter how many errors occur. Otherwise, the value is a positive integer. |
+| `maxFailedItemsPerBatch` | -1, null, or 0, positive integer | Same as above, but used for batch indexing. |
+| `failOnUnsupportedContentType` | true or false | If the indexer can't determine the content type, specify whether to continue or fail the job. |
+| `failOnUnprocessableDocument` | true or false | If the indexer can't process a document of an otherwise supported content type, specify whether to continue or fail the job. |
+| `indexStorageMetadataOnlyForOversizedDocuments` | true or false | Oversized blobs are treated as errors by default. If you set this parameter to true, the indexer tries to index its metadata even if the content can't be indexed. For limits on blob size, see [service Limits](search-limits-quotas-capacity.md). |
-## See also
+## Related content
+ [Change detection and deletion detection](search-how-to-index-azure-blob-changed-deleted.md)
+ [Index large data sets](search-howto-large-index.md)
Summary
{
"modification_type": "minor update",
"modification_title": "Azure Blob Indexerに関する文書の更新"
}
Explanation
この変更は、search-how-to-index-azure-blob-storage.mdというMarkdownファイルに対するもので、ドキュメントの内容が大幅に更新されています。主な変更内容には以下が含まれます:
タイトルと説明の修正: タイトルと説明において、“Azure blob indexer”から”Azure Blob Indexer”に変更があり、より明確な表現に改訂されています。また、説明文もより具体的になり、Azure Blob Storageのコンテンツを自動的にインデックスする方法に関する詳細が追加されています。
セクションの再構成: 各セクションの内容が再構成され、情報の流れがスムーズになっています。特に、インデクサーの設定方法や利用可能な構成オプションに関する説明が整理されています。
技術的な詳細の追加: プラットフォームの技術的な詳細が強化され、特定のプロパティや設定方法に関する触れ方が増えています。たとえば、RBAC(役割ベースのアクセス制御)に関する情報が追加されており、ユーザーがインデクサーを使用する際のセキュリティについての配慮が示されています。
コンテンツの更新: 一部の技術用語や説明の表現が更新され、最新のAzure AI Search機能についての正確な情報が反映されています。
これらの変更は、ユーザーに対する情報提供の明確さと、より良い理解を促進することを目的としたものであり、より効果的なドキュメントとなっています。
articles/search/search-how-to-index-azure-data-lake-storage.md
Diff
@@ -145,7 +145,7 @@ Indexers can connect to a blob container using the following connections.
| The SAS should have the list and read permissions on containers and objects (blobs in this case). |
> [!NOTE]
-> If you use SAS credentials, you will need to update the data source credentials periodically with renewed signatures to prevent their expiration. If SAS credentials expire, the indexer will fail with an error message similar to "Credentials provided in the connection string are invalid or have expired".
+> If you use SAS credentials, you need to update the data source credentials periodically with renewed signatures to prevent their expiration. If SAS credentials expire, the indexer fails with an error message similar to "Credentials provided in the connection string are invalid or have expired".
## Add search fields to an index
Summary
{
"modification_type": "minor update",
"modification_title": "SAS認証に関する注意書きの文言修正"
}
Explanation
この変更は、search-how-to-index-azure-data-lake-storage.mdというMarkdownファイルに対するもので、具体的にはSAS(Shared Access Signature)認証に関連する注意書きの文言に修正が加えられています。以前は、「you will need to update」という表現が使われていましたが、変更後は「you need to update」という表現に変更され、より直接的で明確な指示となっています。
この修正により、ユーザーはSAS認証が必要な場合にデータソースの認証情報を定期的に更新する必要があることをより容易に理解できるようになっています。変更内容は小規模ですが、情報の明確さと重要性に寄与しており、ユーザーの利便性を向上させています。
articles/search/search-how-to-index-azure-tables.md
Diff
@@ -54,7 +54,7 @@ The Description field provides the most verbose content. You should target this
## Use the Azure portal
-You can use either the **Import data** wizard or the **Import data (new)** wizard to automate indexing from an SQL database table or view. The data source configuration is similar for both wizards.
+You can use either the **Import data** wizard or the **Import data (new)** wizard to automate indexing from a SQL database table or view. The data source configuration is similar for both wizards.
1. [Start the wizard](search-import-data-portal.md#starting-the-wizards).
@@ -143,10 +143,10 @@ Indexers can connect to a table using the following connections.
| Container shared access signature |
|-----------------------------------|
| `{ "connectionString" : "ContainerSharedAccessUri=https://<your storage account>.blob.core.windows.net/<container name>?sv=2016-05-31&sr=c&sig=<the signature>&se=<the validity end time>&sp=rl;" }` |
-| The SAS should have the list and read permissions on the container. For more information, see [Using Shared Access Signatures](/azure/storage/common/storage-sas-overview). |
+| The SAS should have the list and read permissions on the container. For more information, see [Grant limited access to Azure Storage resources using shared access signatures (SAS)](/azure/storage/common/storage-sas-overview). |
> [!NOTE]
-> If you use SAS credentials, you'll need to update the data source credentials periodically with renewed signatures to prevent their expiration. When SAS credentials expire, the indexer will fail with an error message similar to "Credentials provided in the connection string are invalid or have expired".
+> If you use SAS credentials, you need to update the data source credentials periodically with renewed signatures to prevent their expiration. If SAS credentials expire, the indexer fails with an error message similar to "Credentials provided in the connection string are invalid or have expired".
<a name="Performance"></a>
Summary
{
"modification_type": "minor update",
"modification_title": "SAS認証に関する注意事項の修正"
}
Explanation
この変更は、search-how-to-index-azure-tables.mdというMarkdownファイルに対するもので、SAS(Shared Access Signature)認証に関連する注意書きの文言と細部が修正されています。具体的な変更は以下の通りです:
- 表現の改善:
- 注意書きの中で、「you’ll need to update」という表現が「you need to update」に改訂され、より直接的な指示が強調されています。この変更により、ユーザーがSAS認証を使用する場合にはデータソースの認証情報を定期的に更新する必要があることがより明確になっています。
- 文書の一貫性の向上:
- “SQL database table”という記述において、“an”を削除し”from a SQL database table”となり、特定の言語的な一貫性が保たれています。
- リソースへのリンクの更新:
- “Using Shared Access Signatures”というリンクのテキストが変更され、内容がより具体的になり、「Grant limited access to Azure Storage resources using shared access signatures (SAS)」とされています。このように、関連リソースへの言及がより正確になっています。
これらの改訂は、ユーザーに対する情報の正確性と明瞭さを向上させ、SAS認証に関連する重要な注意事項を強調する内容となっています。全体として、ドキュメントの可読性と実用性が高まっています。
articles/search/search-how-to-index-sql-managed-instance-with-managed-identity.md
Diff
@@ -1,12 +1,12 @@
---
-title: Connect to Azure SQL Managed Instance using managed identity
+title: Connect to Azure SQL Managed Instance Using a Managed Identity
titleSuffix: Azure AI Search
-description: Learn how to set up an Azure AI Search indexer connection to an Azure SQL Managed Instance using a managed identity
+description: Learn how to set up an Azure AI Search indexer connection to an Azure SQL Managed Instance using a managed identity.
author: gmndrg
ms.author: gimondra
ms.service: azure-ai-search
-ms.topic: conceptual
-ms.date: 06/04/2025
+ms.topic: how-to
+ms.date: 01/21/2026
ms.update-cycle: 180-days
ms.custom:
- ignite-2023
@@ -21,10 +21,7 @@ This article describes how to set up an Azure AI Search indexer connection to [S
You can use a system-assigned managed identity or a user-assigned managed identity (preview). Managed identities are Microsoft Entra logins and require Azure role assignments to access data in SQL Managed Instance.
-Before learning more about this feature, we recommended that you understand what an indexer is and how to set up an indexer for your data source. More information can be found at the following links:
-
-* [Indexer overview](search-indexer-overview.md)
-* [SQL Managed Instance indexer](search-how-to-index-sql-database.md)
+Before you learn more about this feature, you should understand what an indexer is and how to set up an indexer for your data source. For more information, see [Indexers in Azure AI Search](search-indexer-overview.md) and [Index data from Azure SQL Database](search-how-to-index-sql-database.md).
## Prerequisites
@@ -45,7 +42,7 @@ Follow these steps to assign the search service system managed identity permissi
- [Configure a point-to-site connection from on-premises](/azure/azure-sql/managed-instance/point-to-site-p2s-configure)
- [Configure an Azure virtual machine](/azure/azure-sql/managed-instance/connect-vm-instance-configure)
-1. Authenticate with your Microsoft Entra account.
+1. Authenticate by using your Microsoft Entra account.
:::image type="content" source="./media/search-index-azure-sql-managed-instance-with-managed-identity/sql-login.png" alt-text="Showing screenshot of the Connect to Server dialog.":::
@@ -62,7 +59,7 @@ Follow these steps to assign the search service system managed identity permissi
:::image type="content" source="./media/search-index-azure-sql-managed-instance-with-managed-identity/execute-sql-query.png" alt-text="Showing screenshot of how to execute SQL query.":::
-If you later change the search service system identity after assigning permissions, you must remove the role membership and remove the user in the SQL database, then repeat the permission assignment. Removing the role membership and user can be accomplished by running the following commands:
+If you later change the search service system identity after assigning permissions, you must remove the role membership and remove the user in the SQL database, then repeat the permission assignment. To remove the role membership and user, run the following commands:
```sql
sp_droprolemember 'db_datareader', [insert your search service name or user-assigned managed identity name];
@@ -74,17 +71,17 @@ DROP USER IF EXISTS [insert your search service name or user-assigned managed id
In this step, you give your Azure AI Search service permission to read data from your SQL Managed Instance.
-1. In the Azure portal, navigate to your SQL Managed Instance page.
+1. In the Azure portal, go to your SQL Managed Instance page.
1. Select **Access control (IAM)**.
-1. Select **Add** then **Add role assignment**.
+1. Select **Add** > **Add role assignment**.
:::image type="content" source="./media/search-index-azure-sql-managed-instance-with-managed-identity/access-control-add-role-assignment.png" alt-text="Showing screenshot of the Access Control page." lightbox="media/search-index-azure-sql-managed-instance-with-managed-identity/access-control-add-role-assignment.png":::
-1. Select **Reader** role.
+1. Select the **Reader** role.
1. Leave **Assign access to** as **Microsoft Entra user, group, or service principal**.
-1. If you're using a system-assigned managed identity, search for your search service, then select it. If you're using a user-assigned managed identity, search for the name of the user-assigned managed identity, then select it. Select **Save**.
+1. If you're using a system-assigned managed identity, search for your search service and select it. If you're using a user-assigned managed identity, search for the name of the user-assigned managed identity and select it. Select **Save**.
- Example for SQL Managed Instance using a system-assigned managed identity:
+ Here's an example for SQL Managed Instance using a system-assigned managed identity:
:::image type="content" source="./media/search-index-azure-sql-managed-instance-with-managed-identity/add-role-assignment.png" alt-text="Showing screenshot of the member role assignment.":::
@@ -96,9 +93,9 @@ Create the data source and provide a system-assigned managed identity.
The [REST API](/rest/api/searchservice/data-sources/create), Azure portal, and the [.NET SDK](/dotnet/api/azure.search.documents.indexes.models.searchindexerdatasourceconnection) support system-assigned managed identity.
-When you're connecting with a system-assigned managed identity, the only change to the data source definition is the format of the "credentials" property. You provide an Initial Catalog or Database name and a `ResourceId` that has no account key or password. The `ResourceId` must include the subscription ID of SQL Managed Instance, the resource group of SQL Managed instance, and the name of the SQL database.
+When you connect by using a system-assigned managed identity, the only change to the data source definition is the format of the "credentials" property. You provide an Initial Catalog or Database name and a `ResourceId` that has no account key or password. The `ResourceId` must include the subscription ID of SQL Managed Instance, the resource group of SQL Managed instance, and the name of the SQL database.
-Here's an example of how to create a data source to index data from a storage account using the [Create Data Source](/rest/api/searchservice/data-sources/create) REST API and a managed identity connection string. The managed identity connection string format is the same for the REST API, .NET SDK, and the Azure portal.
+Here's an example of how to create a data source to index data from a SQL Managed Instance using the [Create Data Source](/rest/api/searchservice/data-sources/create) REST API and a managed identity connection string. The managed identity connection string format is the same for the REST API, .NET SDK, and the Azure portal.
```http
POST https://[service name].search.windows.net/datasources?api-version=2025-09-01
@@ -139,7 +136,7 @@ api-key: [admin key]
## Create the indexer
-An indexer connects a data source with a target search index, and provides a schedule to automate the data refresh. Once the index and data source are created, you're ready to create the indexer.
+An indexer connects a data source with a target search index and provides a schedule to automate the data refresh. After you create the index and data source, create the indexer.
Here's a [Create Indexer](/rest/api/searchservice/indexers/create) REST API call with an Azure SQL indexer definition. The indexer runs when you submit the request.
@@ -159,7 +156,7 @@ api-key: [admin key]
If you get an error when the indexer tries to connect to the data source that says that the client isn't allowed to access the server, see the [common indexer errors](./search-indexer-troubleshooting.md).
-You can also rule out any firewall issues by trying the connection with and without restrictions in place.
+You can also rule out any firewall problems by trying the connection with and without restrictions in place.
## See also
Summary
{
"modification_type": "minor update",
"modification_title": "ドキュメントの文言修正と構成変更"
}
Explanation
この変更は、search-how-to-index-sql-managed-instance-with-managed-identity.mdというMarkdownファイルに対するもので、複数の文言修正や構成の変更が行われています。主なポイントは以下の通りです:
- タイトルの修正:
- タイトルが「Connect to Azure SQL Managed Instance using managed identity」から「Connect to Azure SQL Managed Instance Using a Managed Identity」に変更され、大文字の使用に関する一貫性が保たれています。
- 一般的な文言の改善:
- 説明文の途中に小さな変更が加えられ、より自然な流れを持つ文章になっています。例えば、「you should understand」や「you must remove」など、より直訳に近い表現が使用されています。
- リンクの表現の改善:
- いくつかのリンク説明が修正され、内容がより明確に表現されています。特に、インデクサに関する説明やSQLデータベースからデータをインデックスする際の情報への言及がより具体的になっています。
- 文書構成の調整:
- 必要な手順や情報の構成が改善され、読みやすさと流れが向上しています。例えば、手順の表現方式が統一され、ナンバリングも一貫しています。
- 技術的な内容の修正:
- 一部の技術的な説明が整理され、より簡潔に表現されています。「If you later change the search service system identity…」のように、手順に関する注意書きが明瞭に修正されています。
これらの改訂は、ユーザーがAzure SQL Managed InstanceにおけるManaged Identityの使用に関してより良く理解できるように、情報の正確性と明瞭性を向上させることを目的としています。全体として、ドキュメントの可読性と実用性が強化されています。
articles/search/search-how-to-integrated-vectorization.md
Diff
@@ -7,7 +7,7 @@ author: haileytap
ms.author: haileytapia
ms.service: azure-ai-search
ms.topic: how-to
-ms.date: 09/12/2025
+ms.date: 01/16/2026
---
# Set up integrated vectorization in Azure AI Search using REST
@@ -42,7 +42,7 @@ Integrated vectorization works with [all supported data sources](search-indexer-
### Supported embedding models
-For integrated vectorization, use one of the following embedding models on an Azure AI platform. Deployment instructions are provided in a [later section](#prepare-your-embedding-model).
+Use one of the following embedding models for integrated vectorization. Deployment instructions are provided in a [later section](#prepare-your-embedding-model).
| Provider | Supported models |
|--|--|
@@ -52,9 +52,9 @@ For integrated vectorization, use one of the following embedding models on an Az
<sup>1</sup> The endpoint of your Azure OpenAI resource must have a [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains), such as `https://my-unique-name.openai.azure.com`. If you created your resource in the [Azure portal](https://portal.azure.com/), this subdomain was automatically generated during resource setup.
-<sup>2</sup> Azure OpenAI resources (with access to embedding models) that were created in the [Foundry portal](https://ai.azure.com/?cid=learnDocs) aren't supported. You must create an Azure OpenAI resource in the Azure portal.
+<sup>2</sup> Azure OpenAI resources (with access to embedding models) that were created in the [Microsoft Foundry portal](https://ai.azure.com/?cid=learnDocs) aren't supported. You must create an Azure OpenAI resource in the Azure portal.
-<sup>3</sup> For billing purposes, you must [attach your Foundry resource](cognitive-search-attach-cognitive-services.md) to the skillset in your Azure AI Search service. Unless you use a [keyless connection](cognitive-search-attach-cognitive-services.md#bill-through-a-keyless-connection) (preview) to create the skillset, both resources must be in the same region.
+<sup>3</sup> For billing purposes, you must [attach your Microsoft Foundry resource](cognitive-search-attach-cognitive-services.md) to your Azure AI Search skillset. Unless you use a [keyless connection](cognitive-search-attach-cognitive-services.md#bill-through-a-keyless-connection) (preview) to create the skillset, both resources must be in the same region.
<sup>4</sup> The Azure Vision multimodal embedding model is available in [select regions](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
@@ -234,7 +234,7 @@ Azure AI Search supports text-embedding-ada-002, text-embedding-3-small, and tex
Azure AI Search supports Azure Vision image retrieval through multimodal embeddings (version 4.0). Internally, Azure AI Search calls the [multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) to connect to Azure Vision.
-1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Foundry resource.
+1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Microsoft Foundry resource.
1. To assign roles:
@@ -255,7 +255,7 @@ Azure AI Search supports Azure Vision image retrieval through multimodal embeddi
1. Copy the endpoint with the `https://[resource-name].services.ai.azure.com` format. You specify this URL later in [Set variables](#set-variables).
> [!NOTE]
- > The multimodal embeddings are built into your Foundry resource, so there's no model deployment step.
+ > The multimodal embeddings are built into your Microsoft Foundry resource, so there's no model deployment step.
<!--### [Foundry model catalog](#tab/prepare-model-catalog)
@@ -572,7 +572,7 @@ or [Azure Vision skill](cognitive-search-skill-vision-vectorize.md)<!--[Azure Op
1. If you're using the Azure OpenAI Embedding skill, set `dimensions` to the [number of embeddings generated by your embedding model](cognitive-search-skill-azure-openai-embedding.md#supported-dimensions-by-modelname).
-1. If you're using the Azure Vision multimodal embeddings skill, [attach your Foundry resource](cognitive-search-attach-cognitive-services.md) after the `skills` array. This attachment is for billing purposes.
+1. If you're using the Azure Vision multimodal embeddings skill, [attach your Microsoft Foundry resource](cognitive-search-attach-cognitive-services.md) after the `skills` array. This attachment is for billing purposes.
```HTTP
"skills": [ ... ],
Summary
{
"modification_type": "minor update",
"modification_title": "Microsoft Foundryリソースの説明の修正"
}
Explanation
この変更は、search-how-to-integrated-vectorization.mdというMarkdownファイルに対するもので、いくつかの文言修正と説明の明確化が行われています。主なポイントは以下の通りです:
- 日時の更新:
- ドキュメント内の日付が「09/12/2025」から「01/16/2026」へと更新され、最新の情報を反映しています。
- 用語の統一:
- 「Foundry」という用語が「Microsoft Foundryリソース」へと修正され、一貫性が保たれています。これにより、リソースに関する説明がより具体的になります。
- 文言の改善:
- 「For integrated vectorization, use one of the following embedding models on an Azure AI platform.」が「Use one of the following embedding models for integrated vectorization.」に変更され、より直接的な指示が与えられています。
- 注意書きの明確化:
- 注意書きや説明文の中で、リソースについての情報が明確化され、「your Microsoft Foundry resource」という表現が加えられることで、ユーザーがどのリソースを指しているのかが明瞭になります。
- 手順の整理:
- 手順が整理され、「sign in to the [Azure portal]」の際のリソースに関する記述が一貫して修正されています。また、ノート部分の表現も改善され、内容が明確になっています。
これらの修正は、ユーザーがAzure AI Searchにおける統合ベクタ化のセットアップに関する情報をより理解しやすくすることを意図しています。全体として、ドキュメントの明瞭さと実用性が向上しています。
articles/search/search-how-to-semantic-chunking.md
Diff
@@ -1,12 +1,12 @@
---
-title: Chunk and vectorize by document layout
+title: Chunk and Vectorize by Document Layout
titleSuffix: Azure AI Search
description: Chunk textual content by headings and semantically coherent fragments, generate embeddings, and send the results to a searchable index.
author: haileytap
ms.author: haileytapia
ms.service: azure-ai-search
ms.topic: how-to
-ms.date: 09/28/2025
+ms.date: 01/16/2026
ms.custom:
- references_regions
- ignite-2024
@@ -22,7 +22,7 @@ In this article, learn how to:
> [!div class="checklist"]
> + Use the Document Layout skill to recognize document structure
-> + Use the Text Split skill to constrain chunk size to each markdown section
+> + Use the Text Split skill to constrain chunk size to each Markdown section
> + Generate embeddings for each chunk
> + Use index projections to map embeddings to fields in a search index
@@ -44,22 +44,22 @@ For illustration purposes, this article uses the [sample health plan PDFs](https
## Prepare data files
-The raw inputs must be in a [supported data source](search-indexer-overview.md#supported-data-sources) and the file needs to be a format which [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) supports.
+You must use a [supported data source](search-indexer-overview.md#supported-data-sources) for the raw inputs, and the file must be in a format supported by the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md).
-+ Supported file formats include: PDF, JPEG, JPG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, HTML.
++ Supported file formats include PDF, JPEG, JPG, PNG, BMP, TIFF, DOCX, XLSX, PPTX, and HTML.
-+ Supported indexers can be any indexer that can handle the supported file formats. These indexers include [Blob indexers](search-how-to-index-azure-blob-storage.md), [Microsoft OneLake indexers](search-how-to-index-onelake-files.md), [File indexers](search-file-storage-integration.md).
++ Supported indexers are any indexer that can handle the supported file formats. These indexers include [Blob indexers](search-how-to-index-azure-blob-storage.md), [Microsoft OneLake indexers](search-how-to-index-onelake-files.md), and [File indexers](search-file-storage-integration.md).
-+ Supported regions for the portal experience of this feature include: East US, West Europe, North Central US. If you're setting up your skillset programmatically, you can use any Azure Document Intelligence region that also provides the AI enrichment feature of Azure AI Search. For more information, see [Product availability by region](https://azure.microsoft.com/explore/global-infrastructure/products-by-region/table).
++ Supported regions for the portal experience of this feature include East US, West Europe, and North Central US. If you're setting up your skillset programmatically, you can use any Azure Document Intelligence region that also provides the AI enrichment feature of Azure AI Search. For more information, see [Supported regions for the Document Layout skill](cognitive-search-skill-document-intelligence-layout.md#supported-regions).
You can use the Azure portal, REST APIs, or an Azure SDK package to [create a data source](search-how-to-index-azure-blob-storage.md).
> [!TIP]
-> Upload the [health plan PDF](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/health-plan) sample files to your supported data source to try out the Document Layout skill and structure-aware chunking on your own search service. The [**Import data (new)** wizard](search-get-started-portal-import-vectors.md) is an easy code-free approach for trying out this skill. Be sure to select the **default parsing mode** to use structure-aware chunking. Otherwise, the [Markdown parsing mode](search-how-to-index-azure-blob-markdown.md) is used.
+> To try the Document Layout skill and structure-aware chunking on your own search service, upload the [health plan PDF](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/health-plan) sample files to your supported data source. The [**Import data (new)** wizard](search-get-started-portal-import-vectors.md) is an easy code-free approach for trying out this skill. Be sure to select the **default parsing mode** to use structure-aware chunking. Otherwise, the [Markdown parsing mode](search-how-to-index-azure-blob-markdown.md) is used.
## Create an index for one-to-many indexing
-Here's an example payload of a single search document designed around chunks. Whenever you're working with chunks, you need a chunk field and a parent field that identifies the origin of the chunk. In this example, parent fields are the text_parent_id. Child fields are the vector and nonvector chunks of the markdown section.
+The following example shows a single search document designed around chunks. When you work with chunks, you need a chunk field and a parent field that identifies the chunk's origin. In this example, parent fields are the `text_parent_id` fields. Child fields are the vector and nonvector chunks of the Markdown section.
The Document Layout skill outputs headings and content. In this example, `header_1` through `header_3` store document headings, as detected by the skill. Other content, such as paragraphs, is stored in `chunk`. The `text_vector` field is a vector representation of the chunk field content.
@@ -183,13 +183,13 @@ If you aren't using the wizard, the index must exist on the search service befor
## Define a skillset for structure-aware chunking and vectorization
-This section shows an example of a skillset definition that projects individual markdown sections, chunks, and their vector equivalents as fields in the search index. It uses the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) to detect headings and populate a content field based on semantically coherent paragraphs and sentences in the source document. It uses the [Text Split skill](cognitive-search-skill-textsplit.md) to split the Markdown content into chunks. It uses the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) to vectorize chunks and any other field for which you want embeddings.
+The following example shows a skillset definition that projects individual Markdown sections, chunks, and their vector equivalents as fields in the search index. It uses the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) to detect headings and populate a content field based on semantically coherent paragraphs and sentences in the source document. It uses the [Text Split skill](cognitive-search-skill-textsplit.md) to split the Markdown content into chunks. It uses the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) to vectorize chunks and any other field for which you want embeddings.
Besides skills, the skillset includes `indexProjections` and `cognitiveServices`:
+ `indexProjections` are used for indexes containing chunked documents. The projections specify how parent-child content is mapped to fields in a search index for one-to-many indexing. For more information, see [Define an index projection](search-how-to-define-index-projections.md).
-+ `cognitiveServices` [attaches a Foundry resource](cognitive-search-attach-cognitive-services.md) for billing purposes (the Document Layout skill is available through [Standard pricing](https://azure.microsoft.com/pricing/details/ai-document-intelligence/)).
++ `cognitiveServices` [attaches a Microsoft Foundry resource](cognitive-search-attach-cognitive-services.md) for billing purposes. The Document Layout skill is available through [Standard pricing](https://azure.microsoft.com/pricing/details/ai-document-intelligence/).
```https
POST {endpoint}/skillsets?api-version=2025-09-01
@@ -312,14 +312,14 @@ POST {endpoint}/skillsets?api-version=2025-09-01
## Configure and run the indexer
-Once you create a data source, index, and skillset, you're ready to [create and run the indexer](search-howto-create-indexers.md#run-the-indexer). This step puts the pipeline into execution.
+After you create a data source, index, and skillset, [create and run the indexer](search-howto-create-indexers.md#run-the-indexer). This step puts the pipeline into execution.
-When using the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md), make sure to set the following parameters on the indexer definition:
+When you use the [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md), set the following parameters on the indexer definition:
-+ The `allowSkillsetToReadFileData` parameter should be set to `true`.
-+ the `parsingMode` parameter should be set to `default`.
++ Set the `allowSkillsetToReadFileData` parameter to `true`.
++ Set the `parsingMode` parameter to `default`.
-`outputFieldMappings` don't need to be set in this scenario because `indexProjections` handle the source field to search field associations. Index projections handle field associations for the Document Layout skill and also regular chunking with the split skill for imported and vectorized data workloads. Output field mappings are still necessary for transformations or complex data mappings with functions which apply in other cases. However, for n-chunks per document, index projections handle this functionality natively.
+You don't need to set `outputFieldMappings` in this scenario because `indexProjections` handle the source field to search field associations. Index projections handle field associations for the Document Layout skill and also regular chunking with the split skill for imported and vectorized data workloads. You still need output field mappings for transformations or complex data mappings with functions which apply in other cases. However, for n-chunks per document, index projections handle this functionality natively.
Here's an example of an indexer creation request.
@@ -355,7 +355,7 @@ When you send the request to the search service, the indexer runs.
You can query your search index after processing concludes to test your solution.
-To check the results, run a query against the index. Use [Search Explorer](search-explorer.md) as a search client, or any tool that sends HTTP requests. The following query selects fields that contain the output of markdown section nonvector content and its vector.
+To check the results, run a query against the index. Use [Search Explorer](search-explorer.md) as a search client, or any tool that sends HTTP requests. The following query selects fields that contain the output of Markdown section nonvector content and its vector.
For Search Explorer, you can copy just the JSON and paste it into the JSON view for query execution.
@@ -376,7 +376,6 @@ POST /indexes/[index name]/docs/search?api-version=[api-version]
"semanticConfiguration": "healthplan-doc-layout-test-semantic-configuration",
"captions": "extractive",
"answers": "extractive|count-3",
- "queryLanguage": "en-us",
"select": "header_1, header_2, header_3"
}
```
@@ -385,20 +384,15 @@ If you used the health plan PDFs to test this skill, Search Explorer results for
+ The query is a [hybrid query](hybrid-search-how-to-query.md) over text and vectors, so you see a `@search.rerankerScore` and results are ranked by that score. `searchMode=all` means that *all* query terms must be considered for a match (the default is *any*).
-+ The query uses semantic ranking, so you see `captions` (it also has `answers`, but those aren't shown in the screenshot). The results are the most semantically relevant to the query input, as determined by the [semantic ranker](semantic-search-overview.md).
++ The query uses semantic ranking, so you see `captions`. It also has `answers`, but they aren't shown in the screenshot. The results are the most semantically relevant to the query input, as determined by the [semantic ranker](semantic-search-overview.md).
+ The `select` statement (not shown in the screenshot) specifies the header fields that the Document Layout skill detects and populates. You can add more fields to the select clause to inspect the content of chunks, title, or any other human readable field.
:::image type="content" source="media/search-how-to-semantic-chunking/query-results-doc-layout.png" lightbox="media/search-how-to-semantic-chunking/query-results-doc-layout.png" alt-text="Screenshot of hybrid query results that include doc layout skill output fields.":::
-## See also
+## Related content
-+ [Create or update a skill set](cognitive-search-defining-skillset.md).
-+ [Create a data source](search-how-to-index-azure-blob-storage.md)
-+ [Define an index projection](search-how-to-define-index-projections.md)
-+ [Attach a Foundry resource](cognitive-search-attach-cognitive-services.md)
+ [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)
+ [Text Split skill](cognitive-search-skill-textsplit.md)
+ [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md)
-+ [Create indexer (REST)](/rest/api/searchservice/indexers/create)
-+ [Search Explorer](search-explorer.md)
++ [Define an index projection for parent-child indexing](search-how-to-define-index-projections.md)
Summary
{
"modification_type": "minor update",
"modification_title": "ドキュメントレイアウトスキルに関する内容の修正"
}
Explanation
この変更は、search-how-to-semantic-chunking.md というMarkdownファイルに対するもので、文言の修正や整合性の改善が行われています。主なポイントは以下の通りです:
- タイトルの修正:
- タイトルが「Chunk and vectorize by document layout」から「Chunk and Vectorize by Document Layout」に変更され、大文字使用のルールが統一されています。
- 日時の更新:
- ドキュメント内の日付が「09/28/2025」から「01/16/2026」へと更新され、最新情報を反映しています。
- 文言の改善:
- 特定の文に対して、Markdownの表現が一貫性を持つよう修正されており、例えば「markdown section」から「Markdown section」に変更され、スタイルガイドに従っています。
- ファイル形式のリスト:
- サポートされるファイル形式のリストが明確化され、情報が整然と整理されています。また、インデクサーに関する説明も改善され、同様のフォーマットで説明されています。
- スキルセットの定義:
- スキルセットの定義に関する説明がより明確に効果的に表現されており、利用するスキルに関する具体的な情報が強調されています。
- 索引作成に関する情報の修正:
- インデクサーの作成に関する段落が整理され、スムーズな読みやすさが向上しました。また、関連情報の見出しが「Related content」に変更され、内容の関連性が強調されています。
これらの修正は、ユーザーが文書を理解しやすくすること、また、Azure AI Searchの構成要素に関する情報が明確で一貫したものになることを目的としています。全体として、ドキュメントの可読性と実用性が大幅に向上しています。
articles/search/search-indexer-howto-access-ip-restricted.md
Diff
@@ -1,13 +1,13 @@
---
-title: Connect through firewalls
+title: Connect Through Firewalls
titleSuffix: Azure AI Search
description: Configure IP firewall rules to allow data access by an Azure AI Search indexer.
manager: nitinme
author: arv100kri
ms.author: arjagann
ms.service: azure-ai-search
ms.topic: how-to
-ms.date: 05/29/2025
+ms.date: 01/21/2026
ms.update-cycle: 180-days
ms.custom:
- ignite-2023
@@ -16,51 +16,67 @@ ms.custom:
# Configure IP firewall rules to allow indexer connections from Azure AI Search
-On behalf of an indexer, a search service issues outbound calls to an external Azure resource to pull in data during indexing. If your Azure resource uses IP firewall rules to filter incoming calls, you must create an inbound rule in your firewall that admits indexer requests.
+On behalf of an indexer, a search service makes outbound calls to an external Azure resource to pull in data during indexing. If your Azure resource uses IP firewall rules to filter incoming calls, you must create an inbound rule in your firewall that admits indexer requests.
This article explains how to find the IP address of your search service and configure an inbound IP rule on an Azure Storage account. While specific to Azure Storage, this approach also works for other Azure resources that use IP firewall rules for data access, such as Azure Cosmos DB and Azure SQL.
> [!NOTE]
-> Applicable to Azure Storage only. Your storage account and your search service must be in different regions if you want to define IP firewall rules. If your setup doesn't permit this, try the [trusted service exception](search-indexer-howto-access-trusted-service-exception.md) or [resource instance rule](/azure/storage/common/storage-network-security#grant-access-from-azure-resource-instances) instead.
+> + Applicable to Azure Storage only. To define IP firewall rules, your storage account and search service must be in different regions. If your setup doesn't permit different regions, try the [trusted service exception](search-indexer-howto-access-trusted-service-exception.md) or [resource instance rule](/azure/storage/common/storage-network-security#grant-access-from-azure-resource-instances) instead.
>
-> For private connections from indexers to any supported Azure resource, we recommend setting up a [shared private link](search-indexer-howto-access-private.md). Private connections travel the Microsoft backbone network, bypassing the public internet completely.
+> + For private connections from indexers to any supported Azure resource, we recommend setting up a [shared private link](search-indexer-howto-access-private.md). Private connections travel the Microsoft backbone network, bypassing the public internet completely.
## Get a search service IP address
-1. Get the fully qualified domain name (FQDN) of your search service. This looks like `<search-service-name>.search.windows.net`. You can find the FQDN by looking up your search service on the Azure portal.
+1. Sign in to the [Azure portal](https://portal.azure.com) and select your search service.
- :::image type="content" source="media\search-indexer-howto-secure-access\search-service-portal.png" alt-text="Screenshot of the search service Overview page." border="true":::
+1. From the left pane, select **Overview**.
-1. Look up the IP address of the search service by performing a `nslookup` (or a `ping`) of the FQDN on a command prompt. Make sure you remove the `https://` prefix from the FQDN.
+1. Copy the fully qualified domain name (FQDN) of your search service, which should look like `my-search-service.search.windows.net`.
-1. Copy the IP address so that you can specify it on an inbound rule in the next step. In the following example, the IP address that you should copy is "150.0.0.1".
+ :::image type="content" source="media/search-get-started-rest/get-endpoint.png" alt-text="Screenshot of the search service Overview page." border="true" lightbox="media/search-get-started-rest/get-endpoint.png":::
+
+1. Look up the IP address of the search service by performing an `nslookup` (or a `ping`) of the FQDN on a command prompt. Make sure you remove the `https://` prefix.
+
+1. Copy the IP address for use in the next step. In the following example, the IP address that you copy is `150.0.0.1`.
```bash
- nslookup contoso.search.windows.net
+ nslookup my-search-service.search.windows.net
Server: server.example.org
Address: 10.50.10.50
Non-authoritative answer:
Name: <name>
Address: 150.0.0.1
- aliases: contoso.search.windows.net
+ aliases: my-search-service.search.windows.net
```
## Allow access from your client IP address
-Client applications that push indexing and query requests to the search service must be represented in an IP range. On Azure, you can generally determine the IP address by pinging the FQDN of a service (for example, `ping <your-search-service-name>.search.windows.net` returns the IP address of a search service).
+Client applications that push indexing and query requests to the search service must be represented in an IP range. On Azure, you can generally determine the IP address by pinging the FQDN of a service. For example, `ping <your-search-service-name>.search.windows.net` returns the IP address of a search service.
+
+Add your client IP address to allow access to the service from the Azure portal on your current computer.
+
+1. In the Azure portal, select your search service.
+
+1. From the left pane, select **Settings** > **Networking**.
-Add your client IP address to allow access to the service from the Azure portal on your current computer. Navigate to the **Networking** section on the left pane. Change **Public Network Access** to **Selected networks**, and then check **Add your client IP address** under **Firewall**.
+1. On the **Firewall and virtual networks** tab, set **Public network access** to **Selected IP addresses**.
- :::image type="content" source="media\service-configure-firewall\azure-portal-firewall.png" alt-text="Screenshot of adding client ip to search service firewall" border="true":::
+ :::image type="content" source="media\service-configure-firewall\azure-portal-firewall.png" alt-text="Screenshot of the option to allow public network access from selected IP addresses in the Azure portal." border="true":::
+
+1. Under **IP Firewall**, select **Add your client IP address**.
+
+ :::image type="content" source="media\service-configure-firewall\azure-portal-firewall-all.png" alt-text="Screenshot of the option to add your client IP address in the Azure portal." border="true":::
+
+1. Save your changes.
## Get the Azure portal IP address
-If you're using the Azure portal or the [Import Data wizard](search-import-data-portal.md) to create an indexer, you need an inbound rule for the Azure portal as well.
+If you're using the Azure portal or an [import wizard](search-import-data-portal.md) to create an indexer, you need an inbound rule for the Azure portal as well.
To get the Azure portal's IP address, perform `nslookup` (or `ping`) on `stamp2.ext.search.windows.net`, which is the domain of the traffic manager. For nslookup, the IP address is visible in the "Non-authoritative answer" portion of the response.
-In the following example, the IP address that you should copy is "52.252.175.48".
+In the following example, the IP address that you should copy is `52.252.175.48`.
```bash
$ nslookup stamp2.ext.search.windows.net
@@ -75,9 +91,9 @@ Aliases: stamp2.ext.search.windows.net
azspncuux.management.search.windows.net
```
-Services in different regions connects to different traffic managers. Regardless of the domain name, the IP address returned from the ping is the correct one to use when defining an inbound firewall rule for the Azure portal in your region.
+Services in different regions connect to different traffic managers. Regardless of the domain name, the IP address returned from the ping is the correct one to use when defining an inbound firewall rule for the Azure portal in your region.
-For ping, the request times out, but the IP address is visible in the response. For example, in the message "Pinging azsyrie.northcentralus.cloudapp.azure.com [52.252.175.48]", the IP address is "52.252.175.48".
+For ping, the request times out, but the IP address is visible in the response. For example, in the message `Pinging azsyrie.northcentralus.cloudapp.azure.com [52.252.175.48]`, the IP address is `52.252.175.48`.
## Get IP addresses for "AzureCognitiveSearch" service tag
@@ -115,35 +131,39 @@ You can get this IP address range from the `AzureCognitiveSearch` service tag.
},
```
-1. For IP addresses have the "/32" suffix, drop the "/32" (40.91.93.84/32 becomes 40.91.93.84 in the rule definition). All other IP addresses can be used verbatim.
+1. For IP addresses that have the `/32` suffix, drop the `/32`. For example, `40.91.93.84/32` becomes `40.91.93.84` in the rule definition. All other IP addresses can be used verbatim.
1. Copy all of the IP addresses for the region.
## Add IP addresses to IP firewall rules
-Now that you have the necessary IP addresses, you can set up the inbound rules. The easiest way to add IP address ranges to a storage account's firewall rule is through the Azure portal.
+After you get the necessary IP addresses, set up the inbound rules. The easiest way to add IP address ranges to a storage account's firewall rule is through the Azure portal.
+
+1. In the Azure portal, select your storage account.
+
+1. From the left pane, select **Security + networking** > **Networking**.
-1. Locate the storage account on the Azure portal and open **Networking** on the left pane.
+1. On the **Public access** tab, select **Manage**.
-1. In the **Firewall and virtual networks** tab, choose **Selected networks**.
+ :::image type="content" source="media\search-indexer-howto-secure-access\manage-network-access.png" alt-text="Screenshot of the button to manage public network access in the Azure portal." border="true":::
- :::image type="content" source="media\search-indexer-howto-secure-access\storage-firewall.png" alt-text="Screenshot of Azure Storage Firewall and virtual networks page" border="true":::
+1. Under **Public network access scope**, select **Enable from selected networks**.
-1. Add the IP addresses obtained previously in the address range and select **Save**. You should have rules for the search service, Azure portal (optional), plus all of the IP addresses for the "AzureCognitiveSearch" service tag for your region.
+ :::image type="content" source="media\search-indexer-howto-secure-access\enable-selected-networks.png" alt-text="Screenshot of the option to enable access from selected networks in the Azure portal." border="true":::
- :::image type="content" source="media\search-indexer-howto-secure-access\storage-firewall-ip.png" alt-text="Screenshot of the IP address section of the page." border="true":::
+1. Add the IP addresses you obtained previously, and then select **Save**. You should have rules for the search service, the Azure portal (optional), and all of the IP addresses for the "AzureCognitiveSearch" service tag for your region.
-It can take five to ten minutes for the firewall rules to be updated, after which indexers should be able to access storage account data behind the firewall.
+ It can take five to ten minutes for the firewall rules to update. After the update, indexers can access storage account data behind the firewall.
## Supplement network security with token authentication
Firewalls and network security are a first step in preventing unauthorized access to data and operations. Authorization should be your next step.
-We recommend role-based access, where Microsoft Entra ID users and groups are assigned to roles that determine read and write access to your service. See [Connect to Azure AI Search using role-based access controls](search-security-rbac.md) for a description of built-in roles and instructions for creating custom roles.
+We recommend role-based access, where Microsoft Entra ID users and groups are assigned to roles that determine read and write access to your service. For a description of built-in roles and instructions for creating custom roles, see [Connect to Azure AI Search using role-based access controls](search-security-rbac.md).
If you don't need key-based authentication, we recommend that you disable API keys and use role assignments exclusively.
-## Next Steps
+## Related content
- [Configure Azure Storage firewalls](/azure/storage/common/storage-network-security)
- [Configure an IP firewall for Azure Cosmos DB](/azure/cosmos-db/how-to-configure-firewall)
Summary
{
"modification_type": "minor update",
"modification_title": "ファイアウォール経由の接続に関するドキュメントの改訂"
}
Explanation
この変更は、search-indexer-howto-access-ip-restricted.mdというMarkdownファイルに対するもので、内容の明確化および表現の一貫性を向上させるための修正が行われています。主なポイントは以下の通りです:
- タイトルの変更:
- タイトルが「Connect through firewalls」から「Connect Through Firewalls」に変更され、ドキュメント内の他のタイトルスタイルに一致させています。
- 日付の更新:
- ドキュメント内の日付が「05/29/2025」から「01/21/2026」へと更新され、最も最近の情報を反映しています。
- 文言の改善:
- 特定のフレーズがより明確に表現され、例えば「The search service issues outbound calls」から「The search service makes outbound calls」に変更され、表現が簡潔になっています。
- 手順の明確化:
- 手順が整理され、特にIPアドレスの取得方法やファイアウォール設定の内容がより明快に記述されています。具体的な手順の各ステップがわかりやすくなっています。
- 注意書きの強調:
- 特定の情報に関して、注意書きが強調され、ユーザーが重要な要件や推奨事項を簡単に把握できるようになっています。
- 関連情報のセクションの改善:
- 「Next Steps」セクションが「Related content」に変更され、関連する他のドキュメントへのリンクが強調されています。
- セキュリティに関する推奨事項の整備:
- ロールベースのアクセス管理の推奨事項がより明確になり、Microsoft Entra IDに関する情報も整理されています。
これらの修正は、ユーザーがAzure AI SearchにおけるIP制限されたリソースへの接続に関する手順をより理解しやすくし、利用しやすくすることを目的としています。全体として、ドキュメントの可読性と実用性が高められています。
articles/search/search-indexer-troubleshooting.md
Diff
@@ -123,7 +123,7 @@ To update the policy and allow indexer access to the document library:
First, obtain the fully qualified domain name (FQDN) of your search service. The FQDN looks like `<your-search-service-name>.search.windows.net`. You can find the FQDN in the Azure portal.
- 
+ :::image type="content" source="media/search-get-started-rest/get-endpoint.png" alt-text="Screenshot of the search service Overview page." border="true" lightbox="media/search-get-started-rest/get-endpoint.png":::
Now that you have the FQDN, get the IP address of the search service by performing a `nslookup` (or a `ping`) of the FQDN. In the following example, you would add "150.0.0.1" to an inbound rule on the Azure Storage firewall. It might take up to 15 minutes after the firewall settings have been updated for the search service indexer to be able to access the Azure Storage account.
Summary
{
"modification_type": "minor update",
"modification_title": "トラブルシューティングドキュメントの画像リンクの修正"
}
Explanation
この変更は、search-indexer-troubleshooting.mdというMarkdownファイルに対するもので、主に画像の参照が更新されています。具体的なポイントは以下の通りです:
- 画像の更新:
- 以前は、ファイアウォールの設定に関連するFQDNを取得する手順において、別の画像が参照されていました。この変更により、画像が適切なものに更新され、より関連性の高いスクリーンショット(「get-endpoint.png」)が表示されるようになりました。
- 文書の一貫性向上:
- 画像の形式がMarkdownの標準的な表記(
:::image ...)に整えられることで、文書全体の一貫性が向上し、ユーザーが情報をより簡単に理解できるようになりました。
この修正は、Azure AI Searchにおけるインデクサートラブルシューティングの手順を明確にし、ユーザーが視覚的に必要な情報を把握しやすくすることを目的としています。全体として、ドキュメントの質および可読性が改善されています。
articles/search/search-relevance-overview.md
Diff
@@ -88,7 +88,7 @@ Scored results are indicated for each match in the query response. This table li
|-------|-------|-------------|
| `@search.score` | 0 through unlimited | [BM25 ranking algorithm](index-similarity-and-scoring.md#scores-in-a-text-results) for text search |
| `@search.score` | 0.333 - 1.00 | [HNSW or exhaustive KNN algorithm](vector-search-ranking.md#scores-in-a-vector-search-results) for vector search |
-| `@search.score` | 0 through an upper limit determined by the number of queries | [RRF algorithm](hybrid-search-ranking.md#scores-in-a-hybrid-search-results) |
+| `@search.score` | 0 through an upper limit determined by the number of queries | [RRF algorithm](hybrid-search-ranking.md#scores-in-hybrid-search-results) |
| `@search.rerankerScore` | 0.00 - 4.00 | [Semantic ranking algorithm](semantic-search-overview.md#how-results-are-scored) for L2 ranking |
| `@search.rerankerBoostedScore` | 0 through unlimited | [Semantic ranking with scoring profile boosting](semantic-how-to-enable-scoring-profiles.md) (scores can be significantly higher than 4) |
Summary
{
"modification_type": "minor update",
"modification_title": "スコアリングアルゴリズムに関するドキュメントの修正"
}
Explanation
この変更は、search-relevance-overview.mdというMarkdownファイルに対するもので、スコアリングアルゴリズムに関する表記が修正されています。以下は具体的なポイントです:
- 表内のリンクテキストの変更:
- 表の中の「RRF algorithm」へのリンク部分が「hybrid-search-results」から「hybrid-search-results」に修正され、Markdown形式のURLがより一貫した形になりました。これにより、リンクの参照が正確になり、情報の信頼性が向上しています。
- 文書の整合性向上:
- 表の整列が保たれ、整然とした印象を与え、ユーザーが情報をより簡単に理解できるように設計されています。
この修正は、検索関連のドキュメントをより正確かつ一貫性のあるものにすることを目的としており、特にスコアリングアルゴリズムに関する情報が明確に示されるようになっています。全体的に、ドキュメントの質と可読性が改善されています。
articles/search/search-security-network-security-perimeter.md
Diff
@@ -1,26 +1,26 @@
---
-title: Add a search service to a network security perimeter
+title: Add a Search Service to a Network Security Perimeter
titleSuffix: Azure AI Search
-description: Add a search service to a network security perimeter for a secure connection
+description: Learn how to add an Azure AI Search service to a network security perimeter for a secure connection.
author: haileytap
ms.author: haileytapia
manager: nitinme
ms.service: azure-ai-search
ms.custom:
- ignite-2024
ms.topic: how-to
-ms.date: 08/18/2025
+ms.date: 01/16/2026
---
# Add a search service to a network security perimeter
-A network security perimeter is a logical network boundary around your platform-as-a-service (PaaS) resources that are deployed outside of a virtual network. It establishes a perimeter for controlling public network access to resources like Azure AI Search, [Azure Storage](/azure/storage/common/storage-network-security-perimeter), and [Azure OpenAI](/azure/ai-foundry/openai/how-to/network-security-perimeter).
+A [network security perimeter](/azure/private-link/network-security-perimeter-concepts) is a logical network boundary around your platform as a service (PaaS) resources that you deploy outside of a virtual network. It establishes a perimeter for controlling public network access to resources like Azure AI Search, [Azure Storage](/azure/storage/common/storage-network-security-perimeter), and [Azure OpenAI](/azure/ai-foundry/openai/how-to/network-security-perimeter).
-This article explains how to join an Azure AI Search service to a [network security perimeter](/azure/private-link/network-security-perimeter-concepts) to control network access to your search service. By joining a network security perimeter, you can:
+This article explains how to join an Azure AI Search service to a network security perimeter to control network access to your search service. By joining a network security perimeter, you can:
* Log all access to your search service in context with other Azure resources in the same perimeter.
* Block any data exfiltration from a search service to other services outside the perimeter.
-* Allow access to your search service using inbound and outbound access capabilities of the network security perimeter.
+* Allow access to your search service by using the inbound and outbound access capabilities of the network security perimeter.
You can add a search service to a network security perimeter in the Azure portal, as described in this article. Alternatively, you can use the [Azure Virtual Network Manager REST API](/rest/api/networkmanager/) to join a search service, and use the [Search Management REST APIs](/rest/api/searchmanagement/network-security-perimeter-configurations?view=rest-searchmanagement-2025-05-01&preserve-view=true) to view and synchronize the configuration settings.
@@ -32,15 +32,15 @@ You can add a search service to a network security perimeter in the Azure portal
## Limitations
-* For search services within a network security perimeter, indexers must use a [system or user-assigned managed identity](search-how-to-managed-identities.md) and have a role assignment that permits read-access to data sources.
+* For search services within a network security perimeter, indexers must use a [system or user-assigned managed identity](search-how-to-managed-identities.md) and have a role assignment that permits read access to data sources.
* Supported indexer data sources are currently limited to [Azure Blob Storage](search-how-to-index-azure-blob-storage.md), [Azure Cosmos DB for NoSQL](./search-how-to-index-cosmosdb-sql.md), and [Azure SQL Database](search-how-to-index-sql-database.md).
* Currently, within the perimeter, indexer connections to Azure PaaS for data retrieval is the primary use case. For outbound skills-driven API calls to Foundry Tools, Azure OpenAI, or the Microsoft Foundry model catalog, or for inbound calls from Foundry for "chat with your data" scenarios, you must [configure inbound and outbound rules](#add-an-inbound-access-rule) to allow the requests through the perimeter. If you require private connections for [structure-aware chunking](search-how-to-semantic-chunking.md) and vectorization, you should [create a shared private link](search-indexer-howto-access-private.md) and a private network.
## Assign a search service to a network security perimeter
-Azure Network Security Perimeter allows administrators to define a logical network isolation boundary for PaaS resources (for example, Azure Storage and Azure SQL Database) that are deployed outside virtual networks. It restricts communication to resources within the perimeter, and it allows non-perimeter public traffic through inbound and outbound access rules.
+By using Azure Network Security Perimeter, administrators can define a logical network isolation boundary for PaaS resources, such as Azure Storage and Azure SQL Database, that are deployed outside virtual networks. It restricts communication to resources within the perimeter, and it allows non-perimeter public traffic through inbound and outbound access rules.
You can add Azure AI Search to a network security perimeter so that all indexing and query requests occur within the security boundary.
@@ -68,18 +68,18 @@ You can add Azure AI Search to a network security perimeter so that all indexing
Network security perimeter supports two different access modes for associated resources:
-| **Mode** | **Description** |
-|----------------|--------|
-| **Learning mode** | This is the default access mode. In *learning* mode, network security perimeter logs all traffic to the search service that would have been denied if the perimeter was in enforced mode. This allows network administrators to understand the existing access patterns of the search service before implementing enforcement of access rules. |
-| **Enforced mode** | In *Enforced* mode, network security perimeter logs and denies all traffic that isn't explicitly allowed by access rules. |
+| Mode | Description |
+|--|--|
+| Learning mode | This is the default access mode. In learning mode, network security perimeter logs all traffic to the search service that would be denied if the perimeter was in enforced mode. This access mode allows network administrators to understand the existing access patterns of the search service before implementing enforcement of access rules. |
+| Enforced mode | In enforced mode, network security perimeter logs and denies all traffic that isn't explicitly allowed by access rules. |
#### Network security perimeter and search service networking settings
The `publicNetworkAccess` setting determines search service association with a network security perimeter.
-* In Learning mode, the `publicNetworkAccess` setting controls public access to the resource.
+* In learning mode, the `publicNetworkAccess` setting controls public access to the resource.
-* In Enforced mode, the `publicNetworkAccess` setting is overridden by the network security perimeter rules. For example, if a search service with a `publicNetworkAccess` setting of `enabled` is associated with a network security perimeter in Enforced mode, access to the search service is still controlled by network security perimeter access rules.
+* In enforced mode, the network security perimeter rules override the `publicNetworkAccess` setting. For example, if a search service with a `publicNetworkAccess` setting of `enabled` is associated with a network security perimeter in enforced mode, access to the search service is still controlled by network security perimeter access rules.
#### Change the network security perimeter access mode
@@ -123,22 +123,22 @@ The `publicNetworkAccess` setting determines search service association with a n
#### Log Analytics workspace
-The `network-security-perimeterAccessLogs` table contains all the logs for every log category (for example `network-security-perimeterPublicInboundResourceRulesAllowed`). Every log contains a record of the network security perimeter network access that matches the log category.
+The `network-security-perimeterAccessLogs` table contains all the logs for every log category, such as `network-security-perimeterPublicInboundResourceRulesAllowed`. Each log contains a record of the network security perimeter network access that matches the log category.
Here's an example of the `network-security-perimeterPublicInboundResourceRulesAllowed` log format:
| Column Name | Meaning | Example Value |
|--|--|--|
-| ResultDescription | Name of the network access operation | POST /indexes/my-index/docs/search |
-| Profile | Which network security perimeter the search service was associated with | defaultProfile |
-| ServiceResourceId | Resource ID of the search service | `search-service-resource-id` |
-| Matched Rule | JSON description of the rule that was matched by the log | `{ "accessRule": "IP firewall" }` |
-| SourceIPAddress | Source IP of the inbound network access, if applicable | 1.1.1.1 |
-| AccessRuleVersion | Version of the network-security-perimeter access rules used to enforce the network access rules | 0 |
+| ResultDescription | Name of the network access operation. | POST /indexes/my-index/docs/search |
+| Profile | Which network security perimeter the search service was associated with. | defaultProfile |
+| ServiceResourceId | Resource ID of the search service. | `search-service-resource-id` |
+| Matched Rule | JSON description of the rule that the log matched. | `{ "accessRule": "IP firewall" }` |
+| SourceIPAddress | Source IP of the inbound network access, if applicable. | 1.1.1.1 |
+| AccessRuleVersion | Version of the network-security-perimeter access rules used to enforce the network access rules. | 0 |
#### Storage Account
-The storage account has containers for every log category (for example `insights-logs-network-security-perimeterpublicinboundperimeterrulesallowed`). The folder structure inside the container matches the resource ID of the network security perimeter and the time the logs were taken. Each line on the JSON log file contains a record of the network security perimeter network access that matches the log category.
+The storage account has containers for every log category, such as `insights-logs-network-security-perimeterpublicinboundperimeterrulesallowed`. The folder structure inside the container matches the resource ID of the network security perimeter and the time the logs were taken. Each line on the JSON log file contains a record of the network security perimeter network access that matches the log category.
For example, the inbound perimeter rules allowed category log uses the following format:
@@ -163,10 +163,10 @@ Within the perimeter, all resources have mutual access at the network level. You
For resources outside of the network security perimeter, you must specify inbound and outbound access rules. Inbound rules specify which connections to allow in, and outbound rules specify which requests are allowed out.
-A search service accepts inbound requests from apps like [Foundry portal](https://ai.azure.com/?cid=learnDocs), Azure Machine Learning prompt flow, and any app that sends indexing or query requests. A search service sends outbound requests during indexer-based indexing and skillset execution. This section explains how to set up inbound and outbound access rules for Azure AI Search scenarios.
+A search service accepts inbound requests from apps like the [Microsoft Foundry portal](https://ai.azure.com/?cid=learnDocs), Azure Machine Learning prompt flow, and any app that sends indexing or query requests. A search service sends outbound requests during indexer-based indexing and skillset execution. This section explains how to set up inbound and outbound access rules for Azure AI Search scenarios.
> [!NOTE]
- > Any service associated with a network security perimeter implicitly allows inbound and outbound access to any other service associated with the same network security perimeter when that access is authenticated using [managed identities and role assignments](/entra/identity/managed-identities-azure-resources/overview). Access rules only need to be created when allowing access outside of the network security perimeter, or for access authenticated using API keys.
+ > When you authenticate access by using [managed identities and role assignments](/entra/identity/managed-identities-azure-resources/overview), any service associated with a network security perimeter implicitly allows inbound and outbound access to any other service associated with the same network security perimeter. You only need to create access rules when you allow access outside of the network security perimeter or for access authenticated by using API keys.
### Add an inbound access rule
@@ -176,7 +176,7 @@ Network security perimeter supports two types of inbound access rules:
* IP address ranges. IP addresses or ranges must be in the Classless Inter-Domain Routing (CIDR) format. An example of CIDR notation is 192.0.2.0/24, which represents the IPs that range from 192.0.2.0 to 192.0.2.255. This type of rule allows inbound requests from any IP address within the range.
-* Subscriptions. This type of rule allows inbound access authenticated using any managed identity from the subscription.
+* Subscriptions. This type of rule allows inbound access authenticated by using any managed identity from the subscription.
To add an inbound access rule in the Azure portal:
@@ -202,7 +202,7 @@ To add an inbound access rule in the Azure portal:
| Setting | Value |
| ------- | ----- |
- | Rule name | The name for the inbound access rule, such as "MyInboundAccessRule." |
+ | Rule name | The name for the inbound access rule, such as `MyInboundAccessRule`. |
| Source type | Valid values are **IP address ranges** or **Subscriptions**. |
| Allowed sources | If you selected **IP address ranges**, enter the IP address range in CIDR format that you want to allow inbound access from. Azure IP ranges are available at [this link](https://www.microsoft.com/download/details.aspx?id=56519). If you selected **Subscriptions**, use the subscription you want to allow inbound access from. |
@@ -214,9 +214,9 @@ To add an inbound access rule in the Azure portal:
A search service makes outbound calls during indexer-based indexing and skillset execution. If your indexer data sources, Foundry Tools, or custom skill logic is outside of the network security perimeter, you should create an outbound access rule that allows your search service to make the connection.
-Recall that in public preview, Azure AI Search can only connect to Azure Storage or Azure Cosmos DB within the security perimeter. If your indexers use other data sources, you need an outbound access rule to support that connection.
+Currently, Azure AI Search can only connect to Azure Storage or Azure Cosmos DB within the security perimeter. If your indexers use other data sources, you need an outbound access rule to support that connection.
-Network security perimeter supports outbound access rules based on the Fully Qualified Domain Name (FQDN) of the destination. For example, you can allow outbound access from any service associated with your network security perimeter to an FQDN such as `mystorageaccount.blob.core.windows.net`.
+The network security perimeter supports outbound access rules based on the Fully Qualified Domain Name (FQDN) of the destination. For example, you can allow outbound access from any service associated with your network security perimeter to an FQDN such as `mystorageaccount.blob.core.windows.net`.
To add an outbound access rule in the Azure portal:
@@ -226,7 +226,7 @@ To add an outbound access rule in the Azure portal:
:::image type="content" source="media/search-security-network-security-perimeter/portal-network-security-perimeter-profiles.png" alt-text="Screenshot of the left hand menu with profiles option selected." border="true":::
-1. Select the profile you're using with your network security perimeter
+1. Select the profile you're using with your network security perimeter.
:::image type="content" source="media/search-security-network-security-perimeter/portal-network-security-perimeter-select-profile.png" alt-text="Screenshot of selecting the profile from network security perimeter." border="true":::
@@ -252,23 +252,23 @@ To add an outbound access rule in the Azure portal:
## Test your connection through network security perimeter
-In order to test your connection through network security perimeter, you need access to a web browser, either on a local computer with an internet connection or an Azure VM.
+To test your connection through network security perimeter, you need access to a web browser, either on a local computer with an internet connection or an Azure VM.
1. Change your network security perimeter association to [enforced mode](#network-security-perimeter-access-modes) to start enforcing network security perimeter requirements for network access to your search service.
1. Decide if you want to use a local computer or an Azure VM.
1. If you're using a local computer, you need to know your public IP address.
1. If you're using an Azure VM, you can either use [private link](/azure/private-link/private-link-overview) or [check the IP address using the Azure portal](/azure/virtual-network/ip-services/virtual-network-network-interface-addresses).
-1. Using the IP address, you can create an [inbound access rule](#add-an-inbound-access-rule) for that IP address to allow access. You can skip this step if you're using private link.
+1. Using the IP address, create an [inbound access rule](#add-an-inbound-access-rule) for that IP address to allow access. You can skip this step if you're using private link.
1. Finally, try navigating to the search service in the Azure portal. If you can view the indexes successfully, then the network security perimeter is configured correctly.
## View and manage network security perimeter configuration
-You can use the [Network Security Perimeter Configuration REST APIs](/rest/api/searchmanagement/network-security-perimeter-configurations?view=rest-searchmanagement-2025-05-01&preserve-view=true) to review and reconcile perimeter configurations.
+Use the [Network Security Perimeter Configuration REST APIs](/rest/api/searchmanagement/network-security-perimeter-configurations?view=rest-searchmanagement-2025-05-01&preserve-view=true) to review and reconcile perimeter configurations.
-Be sure to use `2025-05-01`, which is the latest stable REST API version. [Learn how to call the Search Management REST APIs](search-manage-rest.md).
+Be sure to use the 2025-05-01 REST API version, which is the latest stable version of the Search Management REST APIs. For more information, see [Manage your Azure AI Search service using REST APIs](search-manage-rest.md).
## Related content
Summary
{
"modification_type": "minor update",
"modification_title": "ネットワークセキュリティ境界に関するドキュメントの改善"
}
Explanation
この変更は、search-security-network-security-perimeter.mdというMarkdownファイルにおいて、主に文言や表現の修正が行われています。以下は具体的なポイントです:
- 文言の修正:
- 一部の見出しや説明文の表現が変更され、「Add a search service to a network security perimeter」というタイトルが「Add a Search Service to a Network Security Perimeter」に修正され、文の流れや語尾が調整されています。これにより、文書の整合性と可読性が向上しました。
- 説明の明確化:
- 説明文がより具体的になり、特にサービスの追加手順や制限事項についての表現が改善されました。「public access」および「publicNetworkAccess」の説明が明確になり、読者が理解しやすい内容となっています。
- 情報の一貫性向上:
- 表のフォーマットが整えられ、学習モードや強制モードの説明がクリアになりました。これにより、ネットワークセキュリティ境界に関する情報が一貫して提示され、ユーザーが容易に理解できるようになっています。
全体として、この修正はドキュメントのクオリティを向上させ、ユーザーがネットワークセキュリティ境界とAzure AI Searchに関連する設定をより効果的に理解できるように配慮されています。
articles/search/vector-search-how-to-assign-narrow-data-types.md
Diff
@@ -1,22 +1,22 @@
---
-title: Assign narrow data types
+title: Assign Narrow Data Types
titleSuffix: Azure AI Search
-description: In vector search, assign narrow data types to vector fields to reduce the storage requirements of vector indexes.
+description: Learn how to assign narrow data types to vector fields to reduce the storage requirements of vector indexes.
author: haileytap
ms.author: haileytapia
ms.service: azure-ai-search
ms.update-cycle: 180-days
ms.custom:
- ignite-2024
ms.topic: how-to
-ms.date: 06/12/2025
+ms.date: 01/16/2026
---
# Assign narrow data types to vector fields in Azure AI Search
An easy way to reduce vector size is to store embeddings in a smaller data format. Most embedding models output 32-bit floating point numbers. However, if you quantize your vectors or use an embedding model that natively supports quantization, the output might be float16, int16, or int8, which are significantly smaller than float32. You can accommodate these smaller vector sizes by assigning a narrow data type to a vector field. In the vector index, narrow data types consume less storage.
-Data types are assigned to fields in an index definition. You can use the Azure portal, the [Search REST APIs](/rest/api/searchservice/indexes/create), or an Azure SDK package that provides the feature.
+You assign data types to fields in an index definition. Use the Azure portal, the [Search Service REST APIs](/rest/api/searchservice/indexes/create), or an Azure SDK package that provides the feature.
## Prerequisites
@@ -26,34 +26,33 @@ Data types are assigned to fields in an index definition. You can use the Azure
1. Review the [data types used for vector fields](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields) for recommended usage:
- - `Collection(Edm.Single)` 32-bit floating point (default)
- - `Collection(Edm.Half)` 16-bit floating point (narrow)
- - `Collection(Edm.Int16)` 16-bit signed integer (narrow)
- - `Collection(Edm.SByte)` 8-bit signed integer (narrow)
- - `Collection(Edm.Byte)` 8-bit unsigned integer (only allowed with packed binary data types)
-
+ - `Collection(Edm.Single)`: 32-bit floating point (default)
+ - `Collection(Edm.Half)`: 16-bit floating point (narrow)
+ - `Collection(Edm.Int16)`: 16-bit signed integer (narrow)
+ - `Collection(Edm.SByte)`: 8-bit signed integer (narrow)
+ - `Collection(Edm.Byte)`: 8-bit unsigned integer (only allowed with packed binary data types)
1. From that list, determine which data type is valid for your embedding model's output or for vectors that undergo custom quantization.
- The following table provides links to several embedding models that can use a narrow data type (`Collection(Edm.Half)`) without extra quantization. You can cast from float32 to float16 (using `Collection(Edm.Half)`) with no extra work.
+ The following table provides links to several embedding models that can use a narrow data type, `Collection(Edm.Half)`, without extra quantization. You can cast from float32 to float16 using `Collection(Edm.Half)` with no extra work.
| Embedding model | Native output | Assign this type in Azure AI Search |
|------------------------|---------------|--------------------------------|
- | [text-embedding-ada-002](/azure/ai-services/openai/concepts/models#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
- | [text-embedding-3-small](/azure/ai-services/openai/concepts/models#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
- | [text-embedding-3-large](/azure/ai-services/openai/concepts/models#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
+ | [text-embedding-ada-002](/azure/ai-foundry/foundry-models/concepts/models-sold-directly-by-azure#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
+ | [text-embedding-3-small](/azure/ai-foundry/foundry-models/concepts/models-sold-directly-by-azure#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
+ | [text-embedding-3-large](/azure/ai-foundry/foundry-models/concepts/models-sold-directly-by-azure#embeddings) | `Float32` | `Collection(Edm.Single)` or `Collection(Edm.Half)` |
| [Cohere V3 embedding models with int8 embedding_type](https://docs.cohere.com/reference/embed) | `Int8` | `Collection(Edm.SByte)` |
You can use other narrow data types if your model emits embeddings in the smaller data format or if you have custom quantization that converts vectors to a smaller format.
-1. Make sure you understand the tradeoffs of a narrow data type. `Collection(Edm.Half)` has less information, which results in lower resolution. If your data is homogeneous or dense, losing extra detail or nuance could lead to unacceptable results at query time because there's less detail that can be used to distinguish nearby vectors apart.
+1. Understand the tradeoffs of a narrow data type. `Collection(Edm.Half)` has less information, which results in lower resolution. If your data is homogeneous or dense, losing extra detail or nuance could lead to unacceptable results at query time because there's less detail that can be used to distinguish nearby vectors apart.
## Assign the data type
-[Define and build an index](vector-search-how-to-create-index.md). You can use the Azure portal, [Create or Update Index (REST API)](/rest/api/searchservice/indexes/create-or-update), or an Azure SDK package for this step.
+[Define and build an index](vector-search-how-to-create-index.md). You can use the Azure portal, [Indexes - Create Or Update](/rest/api/searchservice/indexes/create-or-update) (REST API), or an Azure SDK package for this step.
-This field definition uses a narrow data type, `Collection(Edm.Half)`, that can accept a float32 embedding stored as a float16 value. As is true for all vector fields, `dimensions` and `vectorSearchProfile` are set. The specifics of the `vectorSearchProfile` are immaterial to the datatype.
+This field definition uses a narrow data type, `Collection(Edm.Half)`, that accepts a float32 embedding stored as a float16 value. As is true for all vector fields, set the `dimensions` and `vectorSearchProfile` properties. The specifics of the `vectorSearchProfile` are immaterial to the datatype.
-We recommend that you set `retrievable` and `stored` to true if you want to visually check the values of the field. On a subsequent rebuild, you can change these properties to false for reduced storage requirements.
+Set `retrievable` and `stored` to true if you want to visually check the values of the field. On a subsequent rebuild, you can change these properties to false for reduced storage requirements.
```json
{
@@ -78,13 +77,13 @@ Recall that vector fields aren't filterable, sortable, or facetable. They can't
### Working with a production index
-Data types are assigned on new fields when they're created. You can't change the data type of an existing field, and you can't drop a field without [rebuilding the index](search-howto-reindex.md). For established indexes already in production, it's common to work around this issue by creating new fields with the desired revisions and then removing obsolete fields during a planned index rebuild.
+You assign data types on new fields when they're created. You can't change the data type of an existing field, and you can't drop a field without [rebuilding the index](search-howto-reindex.md). For established indexes already in production, a common workaround is to create new fields with the desired revisions and then remove obsolete fields during a planned index rebuild.
## Check results
-1. Verify the field content matches the data type. Assuming the vector field is marked as `retrievable`, use [Search explorer](search-explorer.md) or [Search - POST](/rest/api/searchservice/documents/search-post?) to return vector field content.
+1. Verify the field content matches the data type. Assuming the vector field is marked as `retrievable`, use [Search explorer](search-explorer.md) or [Search - POST](/rest/api/searchservice/documents/search-post?) (REST API) to return vector field content.
-1. To check vector index size, refer to the vector index size column on the **Search management > Indexes** page in the [Azure portal](https://portal.azure.com). Alternatively, you can use [GET Index Statistics (REST API)](/rest/api/searchservice/indexes/get-statistics) or an equivalent Azure SDK method.
+1. To check vector index size, refer to the vector index size column on the **Search management > Indexes** page in the [Azure portal](https://portal.azure.com). You can also use [Indexes - Get Statistics](/rest/api/searchservice/indexes/get-statistics) (REST API) or an equivalent Azure SDK method.
> [!NOTE]
-> The field's data type is used to create the physical data structure. If you want to change a data type later, either [drop and rebuild the index](search-howto-reindex.md) or create a second field with the new definition.
+> The field's data type creates the physical data structure. To change a data type later, either [drop and rebuild the index](search-howto-reindex.md) or create a second field with the new definition.
Summary
{
"modification_type": "minor update",
"modification_title": "ベクター検索のための狭いデータ型の割り当てに関するドキュメントの修正"
}
Explanation
この変更は、vector-search-how-to-assign-narrow-data-types.mdというMarkdownファイルにおいて、主にタイトル、説明、および文中の情報の表現に関する修正が行われています。以下は具体的なポイントです:
- タイトルと説明の修正:
- タイトルが「Assign narrow data types」から「Assign Narrow Data Types」に変更され、より一貫した大文字表記が保障されました。また、説明文も「In vector search, assign narrow data types to vector fields to reduce the storage requirements of vector indexes.」から「Learn how to assign narrow data types to vector fields to reduce the storage requirements of vector indexes.」に変更され、読者に具体的な目的を伝えやすくなりました。
- 情報の整理と明確化:
- データ型のリストにおいて形式が統一され、各データ型の説明が分かりやすくなりました。特に、埋め込みモデルに対するデータ型の適用例が明確になり、ユーザーが容易に理解できるように工夫されています。
- 手順やノートの明確化:
- ドキュメント内の手順や注意事項が再編成され、情報が読みやすくなっています。例えば、データ型の割り当てに関する注意事項が整理されており、物理データ構造についての理解が促されるようになっています。
全体として、この修正はドキュメントのユーザビリティを向上させ、ベクター検索のデータ型の割り当てに関する情報がより正確に伝わることを目指しています。
articles/search/vector-search-how-to-configure-compression-storage.md
Diff
@@ -1,5 +1,5 @@
---
-title: Choose vector optimization
+title: Choose Vector Optimization
titleSuffix: Azure AI Search
description: Learn about the vector compression options in Azure AI Search, and how to reduce storage through narrow data types, built-in scalar or quantization, truncated dimensions, and elimination of redundant storage.
author: haileytap
@@ -9,37 +9,37 @@ ms.update-cycle: 180-days
ms.custom:
- ignite-2024
ms.topic: how-to
-ms.date: 06/12/2025
+ms.date: 01/16/2026
---
# Choose an approach for optimizing vector storage and processing
-Embeddings, or the numerical representation of heterogeneous content, are the basis of vector search workloads, but the sizes of embeddings make them hard to scale and expensive to process. Significant research and productization have produced multiple solutions for improving scale and reducing processing times. Azure AI Search taps into a number these capabilities for faster and cheaper vector workloads.
+Embeddings, or the numerical representation of heterogeneous content, are the basis of vector search workloads. However, the sizes of embeddings make them hard to scale and expensive to process. Significant research and productization have produced multiple solutions for improving scale and reducing processing times. Azure AI Search taps into a number of these capabilities for faster and cheaper vector workloads.
This article covers all of the optimization techniques in Azure AI Search that can help you reduce vector size and query processing times.
-Vector optimization settings are specified in vector field definitions in a search index. Most of the features described in this article are generally available in the [@search.rerankerBoostedScore REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-@search.rerankerBoostedScore&preserve-view=true) and Azure SDK packages targeting that version. The [latest preview version](/rest/api/searchservice/search-service-api-versions#preview-versions) adds support for truncated dimensions if you're using text-embedding-3-large or text-embedding-3-small for vectorization.
+You specify vector optimization settings in vector field definitions in a search index. Most of the features described in this article are generally available in the [latest stable REST API version](/rest/api/searchservice/operation-groups?view=rest-searchservice-2025-09-01&preserve-view=true) and Azure SDK packages targeting that version.
## Evaluate the options
-Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](#example-vector-size-by-vector-compression-technique).
+Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive, so you can combine them for [maximum reduction in vector size](#example-vector-size-by-vector-compression-technique).
-We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort, which tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require special effort to create them, and `stored` saves on disk storage, which isn't as expensive as memory.
+We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort. This approach tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require special effort to create them, and `stored` saves on disk storage, which isn't as expensive as memory.
| Approach | Why use this approach |
|----------|---------------------|
-| [Add scalar or binary quantization](vector-search-how-to-quantization.md) | Compress native float32 or float16 embeddings to int8 (scalar) or byte (binary). This option reduces storage in memory and on disk with no degradation of query performance. Smaller data types, such as int8 or byte, produce vector indexes that are less content-rich than those with larger embeddings. To offset information loss, built-in compression includes options for post-query processing using uncompressed embeddings and oversampling to return more relevant results. Reranking and oversampling are specific features of built-in quantization of float32 or float16 fields and can't be used on embeddings that undergo custom quantization. |
-| [Truncate dimensions for MRL-capable text-embedding-3 models (preview)](vector-search-how-to-truncate-dimensions.md) | Use fewer dimensions on text-embedding-3 models. On Azure OpenAI, these models are retrained on the [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) (MRL) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs with minimal loss of semantic information. In Azure AI Search, MRL support supplements scalar and binary quantization. When you use either quantization method, you can also specify a `truncateDimension` property on your vector fields to reduce the dimensionality of text embeddings. |
-| [Assign smaller primitive data types to vector fields](vector-search-how-to-assign-narrow-data-types.md) | Narrow data types, such as float16, int16, int8, and byte (binary), consume less space in memory and on disk, but you must have an embedding model that outputs vectors in a narrow data format. Alternatively, you must have custom quantization logic that outputs small data. A third use case that requires less effort is recasting native float32 embeddings produced by most models to float16. For information about binary vectors, see [Index binary vectors](vector-search-how-to-index-binary-data.md). |
-| [Eliminate optional storage of retrievable vectors](vector-search-how-to-storage-options.md) | Vectors returned in a query response are stored separately from vectors used during query execution. If you don't need to return vectors, you can turn off retrievable storage, reducing overall per-field disk storage by up to 50 percent. |
+| [Add scalar or binary quantization](vector-search-how-to-quantization.md) | Compress native float32 or float16 embeddings to int8 (scalar) or byte (binary). This option reduces storage in memory and on disk with no degradation of query performance. Smaller data types, such as int8 or byte, produce vector indexes that are less content-rich than those with larger embeddings. To offset information loss, built-in compression includes options for post-query processing by using uncompressed embeddings and oversampling to return more relevant results. Reranking and oversampling are specific features of built-in quantization of float32 or float16 fields and can't be used on embeddings that undergo custom quantization. |
+| [Truncate dimensions for MRL-capable text-embedding-3 models](vector-search-how-to-truncate-dimensions.md) | Use fewer dimensions on text-embedding-3 models. On Azure OpenAI, these models are retrained on the [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) (MRL) technique that produces multiple vector representations at different levels of compression. This approach produces faster searches and reduced storage costs with minimal loss of semantic information. In Azure AI Search, MRL support supplements scalar and binary quantization. When you use either quantization method, you can also specify a `truncateDimension` property on your vector fields to reduce the dimensionality of text embeddings. |
+| [Assign smaller primitive data types to vector fields](vector-search-how-to-assign-narrow-data-types.md) | Narrow data types, such as float16, int16, int8, and byte (binary), consume less space in memory and on disk. However, you must have an embedding model that outputs vectors in a narrow data format. Alternatively, you must have custom quantization logic that outputs small data. A third use case that requires less effort is recasting native float32 embeddings produced by most models to float16. For information about binary vectors, see [Index binary vectors](vector-search-how-to-index-binary-data.md). |
+| [Eliminate optional storage of retrievable vectors](vector-search-how-to-storage-options.md) | Vectors returned in a query response are stored separately from vectors used during query execution. If you don't need to return vectors, you can turn off retrievable storage to reduce overall per-field disk storage by up to 50 percent. |
-All of these options are defined on an empty index. To implement any of them, use the Azure portal, REST APIs, or an Azure SDK package targeting that API version.
+Define all of these options on an empty index. To implement any of them, use the Azure portal, REST APIs, or an Azure SDK package targeting that API version.
-After the index is defined, you can load and index documents as a separate step.
+After you define the index, you can load and index documents as a separate step.
## Example: Vector size by vector compression technique
-[Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md) is a Python code sample that creates multiple search indexes that vary by their use of vector storage quantization, [narrow data types](vector-search-how-to-assign-narrow-data-types.md), and [storage properties](vector-search-how-to-storage-options.md).
+[Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md) is a Python code sample that creates multiple search indexes that vary by their use of vector storage quantization, [narrow data types](vector-search-how-to-assign-narrow-data-types.md), and [storage properties](vector-search-how-to-storage-options.md).
This code creates and compares storage and vector index size for each vector storage optimization option. From these results, you can see that [quantization](vector-search-how-to-quantization.md) reduces vector size the most, but the greatest storage savings are achieved if you use multiple options.
@@ -51,10 +51,10 @@ This code creates and compares storage and vector index size for each vector sto
| compressiontest-no-stored | 10.9224 MB | 4.8277 MB |
| compressiontest-all-options | 4.9192 MB | 1.2242 MB |
-Search APIs report storage and vector size at the index level, so indexes and not fields must be the basis of comparison. Use [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) or an equivalent API in the Azure SDKs to obtain vector size.
+The Search Service REST APIs report storage and vector size at the index level, so you must compare indexes, not fields. Use [Indexes - Get Statistics](/rest/api/searchservice/indexes/get-statistics) (REST API) or an equivalent API in the Azure SDKs to get vector size.
## Related content
-- [Quickstart: Full-text search using REST](search-get-started-text.md)
+- [Vector search in Azure AI Search](vector-search-overview.md)
+- [Create a vector index](vector-search-how-to-create-index.md)
- [Supported data types](/rest/api/searchservice/supported-data-types)
-- [Search REST APIs](/rest/api/searchservice/)
Summary
{
"modification_type": "minor update",
"modification_title": "ベクター検索の圧縮ストレージ設定に関するドキュメントの改訂"
}
Explanation
この変更は、vector-search-how-to-configure-compression-storage.mdというMarkdownファイルにおいて、タイトル、説明、および内容の表現に関する修正を行っています。具体的には以下のポイントが挙げられます:
- タイトルの変更:
- タイトルが「Choose vector optimization」から「Choose Vector Optimization」に改訂され、大文字表記が統一されました。
- 表現の明確化:
- 説明文がのみの修正でなく、それに続く内容において文の構造や表現が改善されました。特に、埋め込みの変更やストレージ設定に関する説明が整理され、読みやすくなっています。
- 最適化の推奨方法の強調:
- ドキュメントの中で各最適化アプローチに関する説明が整えられ、具体的な利点が挙げられるようになりました。たとえば、内蔵の量子化がメモリ内およびディスク上でのベクターサイズの圧縮に最もメリットがあることが強調されています。
- 手順の調整:
- 記事内の手順や勧告が再編成され、特定の手法を実装する際に必要なアクションが明確になっています。「Define all of these options on an empty index」などのフレーズの変更で、設定が行われるタイミングの理解が促進されています。
- コードサンプルの関連付け:
- 「Vector quantization and storage options using Python」へのリンクとその説明が修正され、サンプルがどのようにストレージとベクターインデックスサイズを比較するかが具体的に述べられています。
全体として、これらの変更はドキュメントの精確さと可読性を向上させ、ユーザーがベクター検索の圧縮およびストレージ設定についてより良い理解を得られるように意図されています。
articles/search/vector-search-how-to-index-binary-data.md
Diff
@@ -1,7 +1,7 @@
---
-title: Index binary vectors for vector search
+title: Index Binary Vectors for Vector Search
titleSuffix: Azure AI Search
-description: Explains how to configure fields for binary vectors and the vector search configuration for querying the fields.
+description: Learn how to configure fields for binary vectors and the vector search configuration for querying the fields.
author: haileytap
ms.author: haileytapia
ms.service: azure-ai-search
@@ -10,12 +10,12 @@ ms.custom:
- build-2024
- ignite-2024
ms.topic: how-to
-ms.date: 05/08/2025
+ms.date: 01/16/2026
---
# Index binary vectors for vector search
-Azure AI Search supports a packed binary type of `Collection(Edm.Byte)` for further reducing the storage and memory footprint of vector data. You can use this data type for output from models such as [Cohere's Embed v3 binary embedding models](https://cohere.com/blog/introducing-embed-v3) or any other embedding model or process that outputs vectors as binary bytes.
+Azure AI Search supports the `Collection(Edm.Byte)` packed binary type to further reduce the storage and memory footprint of vector data. You can use this data type for the output of models such as [Cohere's Embed v3 binary embedding models](https://cohere.com/blog/int8-binary-embeddings) or any other embedding model or process that outputs vectors as binary bytes.
There are three steps to configuring an index for binary vectors:
@@ -24,35 +24,38 @@ There are three steps to configuring an index for binary vectors:
> + Add a vector profile that points to the algorithm
> + Add a vector field of type `Collection(Edm.Byte)` and assign the Hamming distance
-This article assumes you're familiar with [creating an index in Azure AI Search](search-how-to-create-search-index.md) and [adding vector fields](vector-search-how-to-create-index.md). It uses the REST APIs to illustrate each step, but you could also add a binary field to an index in the Azure portal or Azure SDK.
-
-The binary data type is assigned to fields using the [Create Index](/rest/api/searchservice/indexes/create) or [Create Or Update Index](/rest/api/searchservice/indexes/create-or-update) APIs.
+This article uses the REST APIs for illustration, but you can also use an Azure SDK or the Azure portal to add a binary field to an index. You assign the binary data type to fields by using the [Indexes - Create](/rest/api/searchservice/indexes/create) or [Indexes - Create Or Update](/rest/api/searchservice/indexes/create-or-update) REST APIs.
> [!TIP]
> If you're investigating binary vector support for its smaller footprint, you might also consider the vector quantization and storage reduction features in Azure AI Search. Inputs are float32 or float16 embeddings. Output is stored data in a much smaller format. For more information, see [Compress using binary or scalar quantization](vector-search-how-to-quantization.md) and [Assign narrow data types](vector-search-how-to-assign-narrow-data-types.md).
## Prerequisites
-+ Binary vectors, with 1 bit per dimension, packaged in uint8 values with 8 bits per value. These can be obtained by using models that directly generate *packaged binary* vectors, or by quantizing vectors into binary vectors client-side during indexing and searching.
++ Familiarity with [creating an index](search-how-to-create-search-index.md) and [adding vector fields](vector-search-how-to-create-index.md).
+
++ Binary vectors, with one bit per dimension, packaged in uint8 values with eight bits per value. You can get these vectors by using models that directly generate *packaged binary* vectors or by quantizing vectors into binary vectors in your client application during indexing and retrieval.
## Limitations
+ No Azure portal support in the **Import data (new)** wizard.
-+ No support for binary fields in the [AML skill](cognitive-search-aml-skill.md) that's used for integrated vectorization of models in the Microsoft Foundry model catalog.
+
++ No support for binary fields in the [AML skill](cognitive-search-aml-skill.md) that's used for integrated vectorization of models from the Microsoft Foundry model catalog.
## Add a vector search algorithm and vector profile
-Vector search algorithms are used to create the query navigation structures during indexing. For binary vector fields, vector comparisons are performed using the Hamming distance metric.
+Vector search algorithms create the query navigation structures during indexing. For binary vector fields, the system uses the Hamming distance metric to perform vector comparisons.
-1. To add a binary field to an index, set up a [`Create or Update Index`](/rest/api/searchservice/indexes/create-or-update) request using the REST API or the Azure portal.
+To configure vector search for binary vectors:
+
+1. Set up an [Indexes - Create or Update](/rest/api/searchservice/indexes/create-or-update) (REST API) request.
1. In the index schema, add a `vectorSearch` section that specifies profiles and algorithms.
-1. Add one or more [vector search algorithms](vector-search-ranking.md) that have a similarity metric of `hamming`. It's common to use Hierarchical Navigable Small Worlds (HNSW), but you can also use Hamming distance with exhaustive K-Nearest Neighbors (KNN).
+1. Add one or more [vector search algorithms](vector-search-ranking.md) that use a similarity metric of `hamming`. The Hierarchical Navigable Small Worlds (HNSW) algorithm is common, but you can also use Hamming distance with exhaustive K-Nearest Neighbors (KNN).
1. Add one or more vector profiles that specify the algorithm.
-The following example shows a basic `vectorSearch` configuration:
+The following example shows a basic `vectorSearch` configuration.
```json
"vectorSearch": {
@@ -85,19 +88,25 @@ The following example shows a basic `vectorSearch` configuration:
## Add a binary field to an index
-The fields collection of an index must include a field for the document key, vector fields, and any other fields that you need for hybrid search scenarios.
+The fields collection of an index must include a field for the document key, vector fields, and any other fields you need for hybrid search scenarios.
-Binary fields are of type `Collection(Edm.Byte)` and contain embeddings in packed form. For example, if the original embedding dimension is `1024`, the packed binary vector length is `ceiling(1024 / 8) = 128`. You get the packed form by setting the `vectorEncoding` property on the field.
+Binary fields use the `Collection(Edm.Byte)` type and contain embeddings in packed form. For example, if the original embedding dimension is `1024`, the packed binary vector length is `ceiling(1024 / 8) = 128`. You get the packed form by setting the `vectorEncoding` property on the field.
-> [!div class="checklist"]
-> + Add a field to the fields collection and give it name.
-> + Set data type to `Collection(Edm.Byte)`.
-> + Set `vectorEncoding` to `packedBit` for binary encoding.
-> + Set `dimensions` to `1024`. Specify the original (unpacked) vector dimension.
-> + Set `vectorSearchProfile` to a profile you defined in the previous step.
-> + Make the field searchable.
+To add a binary vector field to an index:
+
+1. Add a field to the fields collection and give it a name.
+
+1. Set the data type to `Collection(Edm.Byte)`.
+
+1. Set `vectorEncoding` to `packedBit` for binary encoding.
+
+1. Set `dimensions` to `1024`. Specify the original (unpacked) vector dimension.
+
+1. Set `vectorSearchProfile` to a profile you defined in the previous step.
+
+1. Set `searchable` to `true`.
-The following field definition is an example of the properties you should set:
+The following field definition is an example of a binary vector field in an index schema.
```json
"fields": [
@@ -114,8 +123,8 @@ The following field definition is an example of the properties you should set:
]
```
-## See also
+## Related content
-Code samples in the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repository demonstrate end-to-end workflows that include schema definition, vectorization, indexing, and queries.
++ Review the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repository for end-to-end workflows that include schema definition, vectorization, indexing, and queries.
-There's demo code for [Python](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python), [C#](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-dotnet), and [JavaScript](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-javascript).
++ Review the vector search demo code for [C#](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-dotnet), [Python](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-python), and [JavaScript](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-javascript).
Summary
{
"modification_type": "minor update",
"modification_title": "バイナリデータのインデックス作成に関するドキュメントの改訂"
}
Explanation
この変更は、vector-search-how-to-index-binary-data.mdというMarkdownファイルにおいて、タイトル、説明、および内容の表現に関する修正を行っています。具体的には以下のポイントが挙げられます:
- タイトルの変更:
- タイトルが「Index binary vectors for vector search」から「Index Binary Vectors for Vector Search」に改訂され、大文字表記が統一されました。
- 内容の明確化:
- 説明文が「Explains how to configure fields for binary vectors and the vector search configuration for querying the fields.」から「Learn how to configure fields for binary vectors and the vector search configuration for querying the fields.」に変更され、読者にとっての目的がより明確にされています。
- 説明の修正と拡充:
- バイナリベクターの設定手順が整理され、必要な情報が分かりやすく配置されました。また、データ型やエンコーディング方法に関する説明が追加され、理解を助ける内容となっています。
- 手順の一貫性:
- バイナリフィールドの設定手順として、具体的なアクションが番号付きで整理され、読者が手順を追いやすくなっています。特に言及された「Set searchable to true」など、重要な設定が明示的に記載されています。
- 関連コンテンツの強調:
- 関連するコードサンプルの紹介部分が改善され、読者が必要なリソースに簡単にアクセスできるようになっています。
総じて、これらの変更はドキュメントの可読性を向上させ、バイナリデータのインデックス作成に関する説明をより理解しやすい内容にしています。これにより、ユーザーがAzure AI Searchでのバイナリベクターのインデックス作成をスムーズに行えるよう意図されています。
articles/search/vector-search-ranking.md
Diff
@@ -1,20 +1,20 @@
---
-title: Vector relevance and ranking
+title: Vector Relevance and Ranking
titleSuffix: Azure AI Search
-description: Explains the concepts behind vector relevance, scoring, including how matches are found in vector space and ranked in search results.
+description: Learn about the concepts behind vector relevance, scoring, including how matches are found in vector space and ranked in search results.
author: yahnoosh
ms.author: jlembicz
ms.service: azure-ai-search
ms.custom:
- ignite-2023
ms.topic: concept-article
-ms.date: 07/03/2025
+ms.date: 01/21/2026
ms.update-cycle: 180-days
---
# Relevance in vector search
-During vector query execution, the search engine looks for similar vectors to find the best candidates to return in search results. Depending on how you indexed the vector content, the search for relevant matches is either exhaustive, or constrained to nearest neighbors for faster processing. Once candidates are found, similarity metrics are used to score each result based on the strength of the match.
+During vector query execution, the search engine looks for similar vectors to find the best candidates to return in search results. Depending on how you indexed the vector content, the search for relevant matches is either exhaustive or constrained to nearest neighbors for faster processing. When candidates are found, similarity metrics are used to score each result based on the strength of the match.
This article explains the algorithms used to find relevant matches and the similarity metrics used for scoring. It also offers tips for improving relevance if search results don't meet expectations.
@@ -23,7 +23,6 @@ This article explains the algorithms used to find relevant matches and the simil
Vector search algorithms include:
+ [Exhaustive K-Nearest Neighbors (KNN)](#about-exhaustive-knn), which performs a brute-force scan of the entire vector space.
-
+ [Hierarchical Navigable Small World (HNSW)](#about-hnsw), which performs an [Approximate Nearest Neighbor (ANN)](#about-ann) search.
Only vector fields marked as `searchable` in the index or `searchFields` in the query are used for searching and scoring.
@@ -60,58 +59,58 @@ When vector fields are indexed for exhaustive KNN, the query executes against "a
### Creating the HNSW graph
-During indexing, the search service constructs the HNSW graph. The goal of indexing a new vector into an HNSW graph is to add it to the graph structure in a manner that allows for efficient nearest neighbor search. The following steps summarize the process:
+During indexing, the search service constructs the HNSW graph. The goal of indexing a new vector into an HNSW graph is to add it to the graph structure in a way that supports efficient nearest neighbor search. The following steps summarize the process:
-1. Initialization: Start with an empty HNSW graph, or the existing HNSW graph if it's not a new index.
+1. Initialization: Start with an empty HNSW graph or, if it's not a new index, the existing HNSW graph.
1. Entry point: This is the top-level of the hierarchical graph and serves as the starting point for indexing.
1. Adding to the graph: Different hierarchical levels represent different granularities of the graph, with higher levels being more global, and lower levels being more granular. Each node in the graph represents a vector point.
+ Each node is connected to up to `m` neighbors that are nearby. This is the `m` parameter.
- + The number of data points considered as candidate connections is governed by the `efConstruction` parameter. This dynamic list forms the set of closest points in the existing graph for the algorithm to consider. Higher `efConstruction` values result in more nodes being considered, which often leads to denser local neighborhoods for each vector.
+ + The `efConstruction` parameter governs the number of data points considered as candidate connections. This dynamic list forms the set of closest points in the existing graph for the algorithm to consider. Higher `efConstruction` values result in more nodes being considered, which often leads to denser local neighborhoods for each vector.
+ These connections use the configured similarity `metric` to determine distance. Some connections are "long-distance" connections that connect across different hierarchical levels, creating shortcuts in the graph that enhance search efficiency.
1. Graph pruning and optimization: This can happen after indexing all vectors, and it improves navigability and efficiency of the HNSW graph.
### Navigating the HNSW graph at query time
-A vector query navigates the hierarchical graph structure to scan for matches. The following summarize the steps in the process:
+A vector query navigates the hierarchical graph structure to scan for matches. The following steps summarize the process:
1. Initialization: The algorithm initiates the search at the top-level of the hierarchical graph. This entry point contains the set of vectors that serve as starting points for search.
-1. Traversal: Next, it traverses the graph level by level, navigating from the top-level to lower levels, selecting candidate nodes that are closer to the query vector based on the configured distance metric, such as cosine similarity.
+1. Traversal: Next, it traverses the graph level by level, navigating from the top-level to lower levels. It selects candidate nodes that are closer to the query vector based on the configured distance metric, such as cosine similarity.
-1. Pruning: To improve efficiency, the algorithm prunes the search space by only considering nodes that are likely to contain nearest neighbors. This is achieved by maintaining a priority queue of potential candidates and updating it as the search progresses. The length of this queue is configured by the parameter `efSearch`.
+1. Pruning: To improve efficiency, the algorithm prunes the search space by only considering nodes that are likely to contain nearest neighbors. It maintains a priority queue of potential candidates and updates it as the search progresses. The length of this queue is configured by the parameter `efSearch`.
-1. Refinement: As the algorithm moves to lower, more granular levels, HNSW considers more neighbors near the query, which allows the candidate set of vectors to be refined, improving accuracy.
+1. Refinement: As the algorithm moves to lower, more granular levels, HNSW considers more neighbors near the query. This consideration allows the candidate set of vectors to be refined, improving accuracy.
-1. Completion: The search completes when the desired number of nearest neighbors have been identified, or when other stopping criteria are met. This desired number of nearest neighbors is governed by the query-time parameter `k`.
+1. Completion: The search completes when the desired number of nearest neighbors are identified, or when other stopping criteria are met. The query-time parameter `k` governs this desired number of nearest neighbors.
## Similarity metrics used to measure nearness
-The algorithm finds candidate vectors to evaluate similarity. To perform this task, a similarity metric calculation compares the candidate vector to the query vector and measures the similarity. The algorithm keeps track of the ordered set of most similar vectors that its found, which forms the ranked result set when the algorithm has reached completion.
+The algorithm finds candidate vectors to evaluate similarity. To perform this task, a similarity metric calculation compares the candidate vector to the query vector and measures the similarity. The algorithm keeps track of the ordered set of most similar vectors that it found, which forms the ranked result set when the algorithm reaches completion.
| Metric | Description |
-|--------|-------------|
-| `cosine` | This metric measures the angle between two vectors, and isn't affected by differing vector lengths. Mathematically, it calculates the angle between two vectors. Cosine is the similarity metric used by [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity), so if you're using Azure OpenAI, specify `cosine` in the vector configuration.|
-| `dotProduct` | This metric measures both the length of each pair of two vectors, and the angle between them. Mathematically, it calculates the products of vectors' magnitudes and the angle between them. For normalized vectors, this is identical to `cosine` similarity, but slightly more performant. |
+| -------- | ------------- |
+| `cosine` | This metric measures the angle between two vectors and isn't affected by differing vector lengths. Mathematically, it calculates the angle between two vectors. Cosine is the similarity metric used by [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity), so if you're using Azure OpenAI, specify `cosine` in the vector configuration. |
+| `dotProduct` | This metric measures both the length of each pair of two vectors and the angle between them. Mathematically, it calculates the products of vectors' magnitudes and the angle between them. For normalized vectors, this metric is identical to `cosine` similarity, but it's slightly more performant. |
| `euclidean` | (also known as `l2 norm`) This metric measures the length of the vector difference between two vectors. Mathematically, it calculates the Euclidean distance between two vectors, which is the l2-norm of the difference of the two vectors. |
> [!NOTE]
> If you run two or more vector queries in parallel, or if you do a hybrid search that combines vector and text queries in the same request, [Reciprocal Rank Fusion (RRF)](hybrid-search-ranking.md) is used for scoring the final search results.
## Scores in a vector search results
-Scores are calculated and assigned to each match, with the highest matches returned as `k` results. The **`@search.score`** property contains the score. The following table shows the range within which a score will fall.
+The system calculates and assigns scores to each match. The highest matches return as `k` results. The `@search.score` property contains the score. The following table shows the range within which a score falls.
| Search method | Parameter | Scoring metric | Range |
-|---------------|-----------|-------------------|-------|
+| ------------- | --------- | -------------- | ----- |
| vector search | `@search.score` | Cosine | 0.333 - 1.00 |
-For`cosine` metric, it's important to note that the calculated `@search.score` isn't the cosine value between the query vector and the document vectors. Instead, Azure AI Search applies transformations such that the score function is monotonically decreasing, meaning score values will always decrease in value as the similarity becomes worse. This transformation ensures that search scores are usable for ranking purposes.
+For the `cosine` metric, the calculated `@search.score` isn't the cosine value between the query vector and the document vectors. Instead, Azure AI Search applies transformations so that the score function is monotonically decreasing. Score values always decrease as the similarity gets worse. This transformation ensures that search scores are usable for ranking purposes.
There are some nuances with similarity scores:
@@ -120,7 +119,7 @@ There are some nuances with similarity scores:
To create a monotonically decreasing function, the `@search.score` is defined as `1 / (1 + cosine_distance)`.
-Developers who need a cosine value instead of the synthetic value can use a formula to convert the search score back to cosine distance:
+If you need a cosine value instead of the synthetic value, use a formula to convert the search score back to cosine distance:
```csharp
double ScoreToSimilarity(double score)
@@ -134,19 +133,19 @@ Having the original cosine value can be useful in custom solutions that set up t
## Tips for relevance tuning
-If you aren't getting relevant results, experiment with changes to [query configuration](vector-search-how-to-query.md). There are no specific tuning features, such as a scoring profile or field or term boosting, for vector queries:
+If you aren't getting relevant results, try changing the [query configuration](vector-search-how-to-query.md). Vector queries don't have specific tuning features, such as a scoring profile or field or term boosting:
-+ Experiment with [chunk size and overlap](vector-search-how-to-chunk-documents.md). Try increasing the chunk size and ensuring there's sufficient overlap to preserve context or continuity between chunks.
++ Try different [chunk size and overlap](vector-search-how-to-chunk-documents.md) settings. Increase the chunk size and make sure there's enough overlap to keep context or continuity between chunks.
-+ For HNSW, try different levels of `efConstruction` to change the internal composition of the proximity graph. The default is 400. The range is 100 to 1,000.
++ For HNSW, try different levels of `efConstruction` to change the internal composition of the proximity graph. The default value is 400. The range is 100 to 1,000.
-+ Increase `k` results to feed more search results into a chat model, if you're using one.
++ Increase `k` results to send more search results to a chat model if you're using one.
+ Try [hybrid queries](hybrid-search-how-to-query.md) with semantic ranking. In benchmark testing, this combination consistently produced the most relevant results.
-## Next steps
+## Related content
-+ [Try the quickstart](search-get-started-vector.md)
++ [Quickstart: Vector search](search-get-started-vector.md)
+ [Create and configure a vector index](vector-search-how-to-create-index.md)
+ [Learn more about embeddings](vector-search-how-to-generate-embeddings.md)
+ [Learn more about data chunking](vector-search-how-to-chunk-documents.md)
Summary
{
"modification_type": "minor update",
"modification_title": "ベクター検索ランキングに関するドキュメントの改訂"
}
Explanation
この変更は、vector-search-ranking.mdというMarkdownファイルにおいて、タイトル、説明、および内容の表現に関する修正が行われています。具体的には以下のポイントが挙げられます:
- タイトルの変更:
- タイトルが「Vector relevance and ranking」から「Vector Relevance and Ranking」に変更され、大文字表記が統一されています。
- 目的の明確化:
- 説明文が「Explains the concepts behind vector relevance, scoring, including how matches are found in vector space and ranked in search results.」から「Learn about the concepts behind vector relevance, scoring, including how matches are found in vector space and ranked in search results.」に変更され、読者が何を学べるのかが明確になっています。
- ステップの整理:
- HNSWグラフの構築やクエリ時のナビゲーションの説明が簡潔に整理され、各ステップがわかりやすく提示されています。特にプロセスの記述を改善し、各ステップの目的を明確にすることで、理解しやすくなっています。
- 用語の一貫性:
- 文中で用語の使い方が改善され、ガイドラインとしての一貫性が向上しています。たとえば、「候補」と「ノード」などの用語についても、統一された表現が強調されています。
- ヒントや推奨の追加:
- リレバンスチューニングに関するセクションに若干の改善が加えられ、具体的なアクションや推奨事項がわかりやすく強調されています。特に、HNSWに関する設定や結果の調整についての具体的な提案が見やすくなっています。
- 関連コンテンツの強調:
- 関連するリソースへのリンクが強化され、読者が他の関連情報を見つけやすいように整理されています。特に「Related content」というセクションが追加され、次のステップに進む際の手引きが改善されています。
これらの変更は、ドキュメントの全体的な可読性と理解しやすさを向上させ、ユーザーがベクター検索のランキングに関する情報を効果的に吸収できるよう意図されています。