View Diff on GitHub
# Highlights
この変更では、Azure Cognitive Searchに関連する各ファイルに対するマイナーアップデートが行われています。主な変更点は、著者名と更新日の変更、情報の改善、新しいスキルやコンテンツの追加、およびチュートリアルタイトルの修正です。これにより、コンテンツの正確性やユーザビリティ、学習体験が向上しました。
New features
- 新しいスキルとして「GenAI Prompt」が画像コンテンツの処理に関連して紹介されました。
- 各チュートリアルにおいて、生成AIを利用した新しいプロセスやスキルが紹介されています。
Breaking changes
- 特に大きな破壊的変更はありませんが、いくつかのチュートリアルタイトルがより具体的な表現に更新されました。
Other updates
- Azureポータルでの画像検索、ドキュメントレイアウトスキル、関連性の評価プロセス、責任あるAIのベストプラクティスなどに関する多くのファイルが更新されています。
- テクニカルドキュメントの整理、情報の明確化、リンクの更新が一貫して行われました。
Insights
この差分の最大の狙いは、Azure Cognitive Searchの各スキルを利用した新しい技術やベストプラクティスを紹介することであり、これによりユーザーはより包括的で現代的なアプローチを理解し、かつ活用できるようになっています。
多くのファイルでは、チュートリアルのタイトルや内容が改訂され、タイトルだけでなく本文を通してもユーザーが何を学べるのかをより明確に伝えるようになっています。このような変更は、ナビゲーションをより直感的にし、読者が求めている情報に迅速にアクセスできるよう促進しています。
特に、GenAI Promptやマルチモーダル埋め込みを活用した新しいスキルの導入が注目されており、これらのキーワードからもわかるように、AI技術の実用性や可能性が大幅に広がっています。これにより、ユーザーは画像やテキストをベースとしたインデクシングパイプラインの構築において、より高度な手法を学べるようになりました。
総じて、これらのアップデートは、Azure AI Searchの機能とその応用を効果的に伝え、開発者やエンドユーザーが今後も技術を効果的に活用するための基盤を提供するものです。ユーザーはこれにより、AIが提供する新しい技術的な可能性をより理解し、応用できるようになり、精度の高い検索体験が期待できます。
Summary Table
Modified Contents
articles/search/chat-completion-skill-example-usage.md
Diff
@@ -2,11 +2,11 @@
title: Utilize the content generation capabilities of language models as part of content ingestion pipeline
titleSuffix: Azure AI Search
description: Use language models to caption your images and facilitate an image search through your data.
-author: amitkalay
-ms.author: amitkalay
+author: gmndrg
+ms.author: gimondra
ms.service: azure-ai-search
ms.topic: how-to
-ms.date: 05/05/2025
+ms.date: 07/28/2025
ms.custom:
- devx-track-csharp
- build-2025
@@ -22,20 +22,20 @@ The GenAI Prompt skill (preview) generates a description of each image in your d
To work with image content in a skillset, you need:
-+ A supported data source
-+ Files or blobs containing images
-+ Read access on the supported data source. This article uses key-based authentication, but indexers can also connect using the search service identity and Microsoft Entra ID authentication. For role-based access control, assign roles on the data source to allow read access by the service identity. If you're testing on a local development machine, make sure you also have read access on the supported data source.
-+ A search indexer, configured for image actions
-+ A skillset with the new custom genAI prompt skill
-+ A search index with fields to receive the verbalized text output, plus output field mappings in the indexer that establish association
++ A [supported data source](search-indexer-overview.md#supported-data-sources). We recommend Azure Storage.
++ Files or blobs containing images.
++ Read access to the supported data source. This article uses key-based authentication, but indexers can also connect using the search service identity and Microsoft Entra ID authentication. For role-based access control, assign roles on the data source to allow read access by the service identity. If you're testing on a local development machine, make sure you also have read access on the supported data source.
++ A [search indexer](search-how-to-create-indexers.md), configured for image actions.
++ A skillset with the new custom genAI prompt skill.
++ A search index with fields to receive the verbalized text output, plus output field mappings in the indexer that establish association.
Optionally, you can define projections to accept image-analyzed output into a [knowledge store](knowledge-store-concept-intro.md) for data mining scenarios.
<a name="get-normalized-images"></a>
## Configure indexers for image processing
-After the source files are set up, enable image normalization by setting the `imageAction` parameter in indexer configuration. Image normalization helps make images more uniform for downstream processing. Image normalization includes the following operations:
+After the source files are set up, enable image normalization by setting the `imageAction` parameter in the indexer configuration. Image normalization helps make images more uniform for downstream processing. Image normalization includes the following operations:
+ Large images are resized to a maximum height and width to make them uniform.
+ For images that have metadata that specifies orientation, image rotation is adjusted for vertical loading.
Summary
{
"modification_type": "minor update",
"modification_title": "記事の著者と日付の更新および内容の改善"
}
Explanation
この変更では、articles/search/chat-completion-skill-example-usage.md
ファイルに対するマイナーアップデートが行われました。主な内容として、著者名と最終更新日が変更され、内容が改善されています。具体的には、著者は「amitkalay」から「gmndrg」に変更され、最終更新日は「2025年5月5日」から「2025年7月28日」に更新されました。
また、記事の中のいくつかの情報が具体的なリンク付きの構成に修正され、不足していた情報が補足されています。これにより、読者が必要な情報をより簡単に見つけられるようになっています。特に、使用するデータソースや、画像処理のための設定方法について、明確なリンクが提供されています。
全体として、この更新はコンテンツの正確性とユーザビリティを向上させるためのものです。
articles/search/cognitive-search-concept-image-scenarios.md
Diff
@@ -16,6 +16,7 @@ ms.custom:
Images often contain useful information that's relevant in search scenarios. You can [vectorize images](search-get-started-portal-image-search.md) to represent visual content in your search index. Or, you can use [AI enrichment and skillsets](cognitive-search-concept-intro.md) to create and extract searchable *text* from images, including:
+ + [GenAI Prompt](cognitive-search-skill-genai-prompt.md) to pass a prompt to a chat completion skill, requesting a description of image content.
+ [OCR](cognitive-search-skill-ocr.md) for optical character recognition of text and digits
+ [Image Analysis](cognitive-search-skill-image-analysis.md) that describes images through visual features
+ [Custom skills](#passing-images-to-custom-skills) to invoke any external image processing that you want to provide
Summary
{
"modification_type": "minor update",
"modification_title": "画像コンテンツに関する新しいスキルの追加"
}
Explanation
この変更では、articles/search/cognitive-search-concept-image-scenarios.md
ファイルに対してマイナーアップデートが行われ、1つの新しい情報が追加されました。具体的には、画像コンテンツの処理に関連する新たなスキルとして「GenAI Prompt」が紹介されています。
この更新で、「GenAI Prompt」はチャット完了スキルにプロンプトを渡し、画像コンテンツの説明を要求する方法として追加されました。このことにより、画像の内容を検索シナリオに役立てるための選択肢が広がります。他の既存のスキル、例えばOCR(光学式文字認識)や画像分析、カスタムスキルなどと併せて、読者に対しより多様なアプローチを提供しています。
全体として、この追加は画像処理の能力向上を目的とし、ユーザーがより効果的にAIを活用できるような変化を示しています。
articles/search/cognitive-search-skill-document-extraction.md
Diff
@@ -19,9 +19,9 @@ The **Document Extraction** skill extracts content from a file within the enrich
For [vector](vector-search-overview.md) and [multimodal search](multimodal-search-overview.md), Document Extraction combined with the [Text Split skill](cognitive-search-skill-textsplit.md) is more affordable than other [data chunking approaches](vector-search-how-to-chunk-documents.md). The following tutorials demonstrate skill usage for different scenarios:
-+ [Tutorial: Index mixed content using multimodal embeddings and the Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md)
++ [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md)
-+ [Tutorial: Index mixed content using image verbalizations and the Document Extraction skill](tutorial-document-extraction-image-verbalization.md)
++ [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md)
> [!NOTE]
> This skill isn't bound to Azure AI services and has no Azure AI services key requirement.
Summary
{
"modification_type": "minor update",
"modification_title": "チュートリアルのタイトルの修正"
}
Explanation
この変更では、articles/search/cognitive-search-skill-document-extraction.md
ファイルに対してマイナーアップデートが実施されました。主な変更点は、2つのチュートリアルのタイトルが修正されたことです。
具体的には、元のチュートリアルタイトル「Index mixed content using multimodal embeddings and the Document Extraction skill」は「Vectorize images and text」に変更され、また「Index mixed content using image verbalizations and the Document Extraction skill」は「Verbalize images using generative AI」に更新されました。これらの修正は、チュートリアルの内容をより正確に反映させるものです。
この更新により、読者は各チュートリアルが提供する情報をより明確に理解できるようになり、学習の効率が向上することが期待されます。全体として、タイトルの修正は情報の正確性を高めるための重要なステップとなっています。
articles/search/cognitive-search-skill-document-intelligence-layout.md
Diff
@@ -22,16 +22,15 @@ The **Document Layout** skill analyzes a document to detect structure and charac
This article is the reference documentation for the Document Layout skill. For usage information, see [How to chunk and vectorize by document layout](search-how-to-semantic-chunking.md).
-It's common to use this skill on content such as PDFs that have structure and images. The following tutorials demonstrate several scenarios:
+This skill uses the [Document Intelligence layout model](/azure/ai-services/document-intelligence/concept-layout) provided in [Azure AI Document Intelligence](/azure/ai-services/document-intelligence/overview).
-+ [Tutorial: Index mixed content using image verbalizations and the Document Layout skill](tutorial-document-layout-image-verbalization.md)
+This skill is bound to a [billable Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services Standard price](https://azure.microsoft.com/pricing/details/cognitive-services/).
-+ [Tutorial: Index mixed content using multimodal embeddings and the Document Layout skill](tutorial-document-layout-multimodal-embeddings.md)
-
-> [!NOTE]
-> This skill uses the [Document Intelligence layout model](/azure/ai-services/document-intelligence/concept-layout) provided in [Azure AI Document Intelligence](/azure/ai-services/document-intelligence/overview).
+> [!TIP]
+> It's common to use this skill on content such as PDFs that have structure and images. The following tutorials demonstrate image verbalization with two different data chunking techniques:
>
-> This skill is bound to a [billable Azure AI multi-service resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. Execution of built-in skills is charged at the existing [Azure AI services Standard price](https://azure.microsoft.com/pricing/details/cognitive-services/).
+> - [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
+> - [Tutorial: Vectorize from a structured document layout](tutorial-document-layout-multimodal-embeddings.md)
>
## Limitations
Summary
{
"modification_type": "minor update",
"modification_title": "ドキュメントレイアウトスキルの情報更新"
}
Explanation
この変更では、articles/search/cognitive-search-skill-document-intelligence-layout.md
ファイルに対してマイナーアップデートが行われました。主な変更は、ドキュメントレイアウトスキルの説明や関連情報のクリアな提示です。
具体的には、ドキュメントレイアウトスキルが、Azure AI Document Intelligenceによって提供される「Document Intelligence layout model」を使用することが明確に記載されました。また、このスキルが特にPDFのような構造を持つコンテンツに適していることを説明するための新しいヒントが追加されました。
さらに、チュートリアルのリストも整理され、新しいタイトルとして「Verbalize images from a structured document layout」や「Vectorize from a structured document layout」が追加され、スキルの使用例が具体的に示されています。また、スキルの利用に関する料金情報が明示され、ユーザーが理解しやすくなっています。
これにより、ユーザーはドキュメントレイアウトスキルの機能とその利用方法についてより深く理解できるようになり、シナリオに応じた適切な活用が促進されます。全体として、変更は情報の整理と明確化を目的としています。
articles/search/cognitive-search-skill-genai-prompt.md
Diff
@@ -1,29 +1,37 @@
---
title: GenAI Prompt skill (Preview)
titleSuffix: Azure AI Search
-description: Invokes Chat Completion models from Azure OpenAI or other Azure AI Foundry-hosted models at indexing time.
+description: Invokes chat completion models from Azure OpenAI or other Azure AI Foundry-hosted models to create content at indexing time.
author: gmndrg
ms.author: gimondra
ms.service: azure-ai-search
ms.custom:
- build-2025
ms.topic: reference
-ms.date: 05/27/2025
+ms.date: 07/28/2025
---
# GenAI Prompt skill
[!INCLUDE [Feature preview](./includes/previews/preview-generic.md)]
-The **GenAI (Generative AI) Prompt** skill executes a *chat completion* request against a Large Language Model (LLM) deployed in Azure AI Foundry or Azure OpenAI in Azure AI Foundry Models.
+The **GenAI (Generative AI) Prompt** skill executes a *chat completion* request against a Large Language Model (LLM) deployed in Azure AI Foundry or Azure OpenAI in Azure AI Foundry Models. Use this capability to create new information that can be indexed and stored as searchable content.
-Use this capability to create new information that can be indexed and stored as searchable content. Examples include verbalize images, summarize larger passages, simplify complex content, or any other task that an LLM can perform. The skill supports text, image, and multimodal content such as a PDF that contains text and images. It's common to use this skill combined with a data chunking skill. The following tutorials demonstrate the image verbalization scenarios with two different data chunking techniques:
+Here are some examples of how the GenAI prompt skill can help you create content:
-- [Tutorial: Index mixed content using image verbalizations and the Document Layout skill](tutorial-document-layout-image-verbalization.md)
+- Verbalize images
+- Summarize large passages of text
+- Simplify complex content
+- Perform any other task that you can articulate in a prompt
-- [Tutorial: Index mixed content using image verbalizations and the Document Extraction skill](tutorial-document-extraction-image-verbalization.md)
+The GenAI Prompt skill is available in the [2025-05-01-preview REST API](/rest/api/searchservice/skillsets/create?view=rest-searchservice-2025-05-01-preview&preserve-view=true) only. The skill supports text, image, and multimodal content such as a PDF that contains text and images.
-The GenAI Prompt skill is available in the [2025-05-01-preview REST API](/rest/api/searchservice/skillsets/create?view=rest-searchservice-2025-05-01-preview&preserve-view=true) only.
+> [!TIP]
+> It's common to use this skill combined with a data chunking skill. The following tutorials demonstrate image verbalization with two different data chunking techniques:
+>
+> - [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md)
+> - [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
+>
## Supported models
Summary
{
"modification_type": "minor update",
"modification_title": "GenAIプロンプトスキルの説明の修正"
}
Explanation
この変更では、articles/search/cognitive-search-skill-genai-prompt.md
ファイルに対してマイナーアップデートが行われました。主な目的は、GenAIプロンプトスキルの機能と用途に関する説明を明確にし、ユーザーへの理解を深めるための詳細情報を追加することです。
具体的には、スキルの説明が「チャット完了モデル」を呼び出す際に新しい情報を作成する能力が強調されました。これにより、生成されたコンテンツが検索可能であることが明確になっています。また、使用例として、「画像の口頭化」、「長文の要約」、「複雑なコンテンツの簡略化」などが追加され、ユーザーがどのようにスキルを活用できるかが具体的に示されました。
さらに、チュートリアルのリンクが整理され、画像の口頭化に関する新しいチュートリアルが追加されています。加えて、スキルがデータチャンクスキルと組み合わせて使用されることが一般的であることが強調され、関連する情報が追加されたことで、ユーザーはスキルの適切な利用方法をより簡単に理解できるようになっています。
全体として、このマイナーアップデートは、GenAIプロンプトスキルの使用方法についての情報を整理し、ユーザーの利便性を向上させることを目的としています。
articles/search/knowledge-store-projection-example-long.md
Diff
@@ -7,22 +7,22 @@ manager: nitinme
author: HeidiSteen
ms.author: heidist
ms.service: azure-ai-search
-ms.topic: conceptual
-ms.date: 06/17/2025
+ms.topic: concept-article
+ms.date: 07/28/2025
ms.custom:
- ignite-2023
- sfi-ropc-nochange
---
-# Detailed example of shapes and projections in a knowledge store
+# Example of shapes and projections in a knowledge store
-This article provides a detailed example that supplements [high-level concepts](knowledge-store-projection-overview.md) and [syntax-based articles](knowledge-store-projections-examples.md) by walking you through the shaping and projection steps required for fully expressing the output of a rich skillset in a [knowledge store](knowledge-store-concept-intro.md).
+This article provides a detailed example that supplements [high-level concepts](knowledge-store-projection-overview.md) and [syntax-based articles](knowledge-store-projections-examples.md) by walking you through the shaping and projection steps required for fully expressing the output of a rich skillset in a [knowledge store](knowledge-store-concept-intro.md) in Azure Storage.
-If your application requirements call for multiple skills and projections, this example can give you a better idea of how shapes and projections intersect.
+If your application requirements call for multiple skills and projections, this example can give you a better idea of how shapes and projections interact.
## Set up sample data
-Sample documents aren't included with the Projections collection, but the [AI enrichment demo data files](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ai-enrichment-mixed-media) contain text and images that work with the projections described in this example.
+Sample documents aren't included with the Projections collection, but the [AI enrichment demo data files](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/ai-enrichment-mixed-media) contain text and images that work with the projections described in this example. If you use this sample data, you can skip step that [attaches an Azure AI multi-service account](cognitive-search-attach-cognitive-services.md) because you stay under the daily indexer limit for free enrichments.
Create a blob container in Azure Storage and upload all 14 items.
@@ -39,7 +39,7 @@ Pay close attention to skill outputs (targetNames). Outputs written to the enric
```json
{
"name": "projections-demo-ss",
- "description": "Skillset that enriches blob data found in "merged_content". The enrichment granularity is a document.",
+ "description": "Skillset that enriches blob data found in the merged_content field. The enrichment granularity is a document.",
"skills": [
{
"@odata.type": "#Microsoft.Skills.Text.V3.EntityRecognitionSkill",
@@ -182,12 +182,15 @@ Pay close attention to skill outputs (targetNames). Outputs written to the enric
"cognitiveServices": {
"@odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
"description": "An Azure AI services resource in the same region as Search.",
- "key": "<Azure AI services All-in-ONE KEY>"
+ "key": ""
},
"knowledgeStore": null
}
```
+> [!NOTE]
+> Under `"cognitiveServices"`, the key field is unspecified because the indexer can use an Azure AI multi-service account in the same region as your search service and process up to 20 transactions daily at no charge. The sample data for this example stays under the 20 transaction limit.
+
## Example Shaper skill
A [Shaper skill](cognitive-search-skill-shaper.md) is a utility for working with existing enriched content instead of creating new enriched content. Adding a Shaper to a skillset lets you create a custom shape that you can project into table or blob storage. Without a custom shape, projections are limited to referencing a single node (one projection per output), which isn't suitable for tables. Creating a custom shape aggregates various elements into a new logical whole that can be projected as a single table, or sliced and distributed across a collection of tables.
Summary
{
"modification_type": "minor update",
"modification_title": "知識ストアの投影の詳細な例に関する修正"
}
Explanation
この変更では、articles/search/knowledge-store-projection-example-long.md
ファイルに対してマイナーアップデートが行われました。主な変更点は、知識ストアにおける形状と投影に関する具体的な例の説明を明確にし、読者が情報をより簡単に理解できるようにすることです。
具体的には、記事のタイトルが「詳細な例」から「例」に短縮され、簡潔さが増しました。さらに、Azure Storageへの言及が追加され、読者が知識ストアのコンテキストをより理解しやすくなっています。また、「形状と投影」がどのように相互作用するのかについての説明も強化され、具体的な使い方が明確にされています。
サンプルデータに関するセクションでは、Azure AIマルチサービスアカウントを使用して、無料の強化データの制限内に収められることが説明されており、ユーザーがスムーズに作業を進められるように配慮されています。また、新しいノートが追加され、"cognitiveServices"
のキー項目が未指定である理由が明確にされ、インデクサーが地域内で最大20件の取引を無料で処理できることが説明されています。
全体として、このマイナーアップデートは、知識ストアの投影についての情報を整理し、ユーザーが具体的な例を通じて内容をよりよく理解できるようになることを目的としています。
articles/search/multimodal-search-overview.md
Diff
@@ -116,8 +116,8 @@ To help you get started with multimodal search in Azure AI Search, here's a coll
| Content | Description |
|--|--|
| [Quickstart: Multimodal search in the Azure portal](search-get-started-portal-image-search.md) | Create and test a multimodal index in the Azure portal using the wizard and Search Explorer. |
-| [Tutorial: Image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md) | Extract text and images, verbalize diagrams, and embed the resulting descriptions and text into a searchable index. |
-| [Tutorial: Multimodal embeddings and Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md) | Use a vision-text model to embed both text and images directly, enabling visual-similarity search over scanned PDFs. |
-| [Tutorial: Image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md) | Apply layout-aware chunking and diagram verbalization, capture location metadata, and store cropped images for precise citations and page highlights. |
-| [Tutorial: Multimodal embeddings and Document Layout skill](tutorial-document-layout-multimodal-embeddings.md) | Combine layout-aware chunking with unified embeddings for hybrid semantic and keyword search that returns exact hit locations. |
+| [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md) | Extract text and images, verbalize diagrams, and embed the resulting descriptions and text into a searchable index. |
+| [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md) | Use a vision-text model to embed both text and images directly, enabling visual-similarity search over scanned PDFs. |
+| [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md) | Apply layout-aware chunking and diagram verbalization, capture location metadata, and store cropped images for precise citations and page highlights. |
+| [Tutorial: Vectorize from a structured document layout](tutorial-document-layout-multimodal-embeddings.md) | Combine layout-aware chunking with unified embeddings for hybrid semantic and keyword search that returns exact hit locations. |
| [Sample app: Multimodal RAG GitHub repository](https://aka.ms/azs-multimodal-sample-app-repo) | An end-to-end, code-ready RAG application with multimodal capabilities that surfaces both text snippets and image annotations. Ideal for jump-starting enterprise copilots. |
Summary
{
"modification_type": "minor update",
"modification_title": "マルチモーダル検索のチュートリアルおよびサンプルアプリの更新"
}
Explanation
この変更では、articles/search/multimodal-search-overview.md
ファイルに対してマイナーアップデートが行われました。主な内容は、マルチモーダル検索に関連するチュートリアルおよびサンプルアプリのリンクが更新され、より関連性の高い情報が提供されるように改善されました。
具体的には、チュートリアルのタイトルが変更され、より直感的で現代的な表現にアップデートされました。従来の「画像口頭化とドキュメント抽出スキル」や「マルチモーダル埋め込みとドキュメントレイアウトスキル」などの表現が、より具体的で理解しやすい形式に改善されています。たとえば、「画像を生成的AIで口頭化する」や「構造化されたドキュメントレイアウトからベクトル化する」といった具体的なプロセスが明確に示されています。
また、リンク先は従来のチュートリアルと同様ですが、最新の情報と手法を反映しており、ユーザーがマルチモーダル検索をより効果的に活用できるようになっています。これらの変更により、受講者が求める情報により迅速にアクセスできるようになり、学習体験が向上しています。
全体として、このマイナーアップデートは、ユーザーに対してマルチモーダル検索の利用価値を高め、実用的なリソースを提供することを目指しています。
articles/search/responsible-ai-best-practices-genai-prompt-skill.md
Diff
@@ -8,7 +8,7 @@ ms.service: azure-ai-search
ms.custom:
- build-2025
ms.topic: concept-article
-ms.date: 04/28/2025
+ms.date: 07/28/2025
---
# Best practices - GenAI Prompt skill
@@ -25,38 +25,46 @@ The content generation capabilities of language models are continuing to evolve
In order to list out the various challenges in incorporating AI content generation capabilities into an Azure AI Search indexer pipeline, it's important to understand the various personas that interact with the RAG application as each of them might carry a different set of challenges.
-* End-user: This persona is the one that is asking questions to the RAG application, expecting a well cited answer to their question based on results from the source document. In addition to accuracy of the answer, the end-user expects that any citations provided by the application make it clear if it was from verbatim content in a file from the data source or if it was based off say an AI powered summary of content from the file.
-* RAG application developer/search index admin: This persona is responsible for configuring the search index schema, and setting up the indexer and skillset to ingest language model augmented data into the index. GenAI Prompt custom skill allows developers to configure free-form prompts to several models hosted in AI foundry, thereby offering significant flexibility to light up various scenarios. However, developers need to ensure that the combination of data + skill configuration used in the pipeline doesn't produce harmful or unsafe content. Developers also need to evaluate the content generated by the language models for bias, inaccuracies, and incorrect information. This becomes particularly challenging to do for documents at a large scale and should be one of the first steps when building a RAG application, along with the index schema definition.
-* Data authority: This persona is expected to be the key subject matter expert (SME) for the content from the data source. The SME is expected to be the best judge of language model powered enrichments ingested into the index and the answer generated by the language model in the RAG application. The key role for the data authority to be able to get a representative sample and verify the quality of the enrichments and the answer, which can be challenging if dealing with data at large scale.
+| Persona | Description |
+|---------|-------------|
+| End user | The person asking questions of the RAG application, expecting a well-cited answer to their question based on results from the source document. In addition to accuracy of the answer, the end-user expects that any citations provided by the application make it clear if it was from verbatim content from a source file or an AI-powered summary from the model. |
+| RAG application developer/search index admin | The person responsible for configuring the search index schema, and setting up the indexer and skillset to ingest language model augmented data into the index. GenAI Prompt custom skill allows developers to configure free-form prompts to several models hosted in AI foundry, thereby offering significant flexibility to light up various scenarios. However, developers need to ensure that the combination of data and skills used in the pipeline doesn't produce harmful or unsafe content. Developers also need to evaluate the content generated by the language models for bias, inaccuracies, and incorrect information. Although this task can be challenging for documents at a large scale, it should be one of the first steps when building a RAG application, along with the index schema definition. |
+| Data authority | The person expected to be the key subject matter expert (SME) for the content from the data source. The SME is expected to be the best judge of language model powered enrichments ingested into the index and the answer generated by the language model in the RAG application. The key role for the data authority to be able to get a representative sample and verify the quality of the enrichments and the answer, which can be challenging if dealing with data at large scale. |
The rest of this document lists out these various challenges along with tips and best practices that RAG application developers can follow to mitigate any risks.
## Challenges
-The following are the key challenges faced by the various personas that interact with a RAG systems that utilize language models to augment content ingested into a search index (using the GenAI Prompt custom skill) and to formulate answers for questions:
+The following challenges are faced by the various personas that interact with a RAG systems that utilize language models to augment content ingested into a search index (using the GenAI Prompt custom skill) and to formulate answers for questions:
+
+* Transparency: Users of RAG systems should understand that AI models might not always produce accurate or well-formulated answers. Azure AI Search has a robustly documented [Transparency Note](/legal/search/transparency-note) that developers should read through to understand the various ways in which AI is used to augment the capabilities of the core search engine. It's recommended that developers who build RAG applications share the transparency note to users of their applications, since they might be unaware of how AI interfaces with various aspects of the application being used. Additionally, when utilizing the GenAI Prompt custom skill developers should note that only part of the content ingested into the search index is generated by the language model and should highlight this to users of their applications.
-* Transparency: Users of RAG systems should understand that many the system is powered by AI models that might not always be accurate in the content ingested or the answer formulated. Azure AI Search has a robustly documented [Transparency Note](/legal/search/transparency-note) that developers should read through to understand the various ways in which AI is used to augment the capabilities of the core search engine. It's recommended that developers who build RAG applications share the transparency note to users of their applications, since they might be unaware of how AI interfaces with various aspects of the application being used. Additionally, when utilizing the GenAI Prompt custom skill developers should note that only part of the content ingested into the search index is generated by the language model and should highlight this to users of their applications.
* Content sampling/inspection of content quality: Developers and data SMEs should consider sampling some of the content ingested into the search index after being augmented by the GenAI Prompt custom skill in order to inspect the quality of the enrichment performed by their language model. [Debug sessions](cognitive-search-debug-session.md) and [search explorer](search-explorer.md) on the Azure portal can be used for this purpose.
-* Content safety filtering and evaluations: It's important for developers to ensure that the language models they use with the GenAI Prompt custom skill have appropriate filters to ensure safety of the content generated and after ingested into the search index. Developers and data SMEs should also make sure they evaluate the content generated by the language model on various metrics such as accuracy, task specific performance, bias, and risk. Azure AI Foundry offers a robust set of tools for developers to add [content safety filters](../ai-foundry/ai-services/content-safety-overview.md) and [clear guidance for evaluation approaches](../ai-foundry/concepts/evaluation-approach-gen-ai.md)
-* Being agile in rolling back changes or modifying skill configuration: It's possible for the language model that is used with the GenAI Prompt custom skill to have issues over time (such as producing low-quality content). Developers should be prepared to roll back these changes either by altering their indexer and skillset configuration or by excluding index fields with AI generated content from search queries.
+
+* Content safety filtering and evaluations: It's important for developers to ensure that the language models they use with the GenAI Prompt custom skill have appropriate filters to ensure safety of the content generated and after ingested into the search index. Developers and data SMEs should also make sure they evaluate the content generated by the language model on various metrics such as accuracy, task specific performance, bias, and risk. Azure AI Foundry offers a robust set of tools for developers to add [content safety filters](../ai-foundry/ai-services/content-safety-overview.md) and [clear guidance for evaluation approaches](../ai-foundry/concepts/evaluation-approach-gen-ai.md).
+
+* Agility in rolling back changes or modifying skill configuration: It's possible for the language model used with the GenAI Prompt custom skill to have issues over time (such as producing low-quality content). Developers should be prepared to roll back these changes either by altering their indexer and skillset configuration or by excluding index fields with AI generated content from search queries.
## Best practices to mitigate risks
When utilizing the GenAI Prompt custom skill to power RAG applications, there's a risk of over-reliance on AI as outlined in the challenges from the previous section. In this part of the document, we present some patterns and strategies to use to mitigate the risks and overcome the challenges.
### Content sampling and inspection before ingestion into the search index
-[Debug sessions](cognitive-search-debug-session.md) is an Azure AI Search feature available to customers who utilize the Azure portal to inspect the state of enrichment for a single document. To utilize a debug session, Azure AI Search customers need to create a skillset, and an indexer and have the indexer complete one run. We recommend customers that have an indexer utilizing the GenAI Prompt custom skill to initially ingest content into a "development" index - such an indexer can be used with a debug session to inspect the entire structure and contents of the enriched document that will be written into the index. A single run of a debug session works with one specific live document, and will have the content generated by the language model show up in a specific part of the enriched document. Developers can utilize several runs of their debug session, pointing to different documents from their data source to get a reasonable idea of the state of the content produced by their language model (and its relationship to the enriched document structure). The images below show how developers can inspect both the configuration of a skill and the values produced by the skill after calling the language model.
+[Debug sessions](cognitive-search-debug-session.md) is a tool built into the Azure portal. You can use it to inspect the state of enrichment for a single document. To start a debug session, create a skillset, and an indexer and have the indexer complete one run. We recommend that you begin with a "development" index before moving forward with solution. While the index is in development, use a debug session to inspect the entire structure and contents of the enriched document that will be written into the index. A single run of a debug session works with one specific live document, and will have the content generated by the language model show up in a specific part of the enriched document. Developers can utilize several runs of their debug session, pointing to different documents from their data source to get a reasonable idea of the state of the content produced by their language model (and its relationship to the enriched document structure).
-#### Inspecting the configuration of the GenAI Prompt skill
+ The screenshots below show how developers can inspect both the configuration of a skill and the values produced by the skill after calling the language model.
-[  ](./media/responsible-ai-practices-genai-prompt-skill/debug-session-skill-inspection.png#lightbox)
+#### Example: Inspect the configuration of the GenAI Prompt skill
+[  ](./media/responsible-ai-practices-genai-prompt-skill/debug-session-skill-inspection.png#lightbox)
-#### Inspecting the output from the GenAI Prompt skill
+#### Example: Inspect the output from the GenAI Prompt skill
[  ](./media/responsible-ai-practices-genai-prompt-skill/debug-session-skill-output-inspection.png#lightbox)
+#### Use Search Explorer to inspect output
+
In addition to debug sessions, Azure AI Search also offers the ability to explore multiple documents at once by querying the search index via the Azure portal [search explorer](search-explorer.md). Developers can issue a broad query to retrieve a large number of documents from their search index and can inspect the fields which have their content generated by the GenAI Prompt custom skill. To be able to view the contents of the field, when defining the index schema it needs to be configured with the "Retrievable" property. For the same document that was inspected via the debug session, the image below shows the full contents of the search document that ends up into the index.
[  ](./media/responsible-ai-practices-genai-prompt-skill/search-explorer-inspect-document.png#lightbox)
@@ -79,15 +87,14 @@ The previous two sections stressed the importance for developers to have a "deve
Once the evaluation in the development environment is satisfactory, developers should transition the ingestion process to a production environment, where the indexer operates on the full customer data. However, it's possible for there to be unexpected drops in quality or performance when operating on this data set. It's also possible for the model to be updated without undergoing evaluation in the development environment - both these cases can result in a suboptimal experience for users interacting with RAG applications, and developers need to be agile in detecting and mitigating such conditions. To catch such situations, developers should ensure that they also have a constant monitoring of their "production" index and be ready to modify configurations as needed. The following sections describe some patterns developers could adopt to be responsive to such scenarios.
-#### Primary-Seconday index powering RAG applications
+#### Primary-Secondary index powering RAG applications
Developers should consider having a primary and a secondary index to power their RAG applications. The primary and secondary indexes would be similar in the configuration of fields - the only difference would be that the primary index will have an extra (searchable and retrievable) field which will contain content generated from the language model through the GenAI Prompt custom skill. Developers should configure their RAG applications such that the AI model being augmented can use either the primary or the secondary index as it's knowledge source. The primary index should be preferred, but if the quality of the results produced by the RAG application seems to be adversely impacted, the application should swap to using the secondary index which doesn't have generated content as part of the knowledge source. This can be achieved without needing any code change/redeployment of the RAG app by utilizing the [index alias feature](search-how-to-alias.md) and having the RAG application query the alias, and then swapping the indexes that map to the alias if necessary.
The following diagram illustrates this pattern.
[  ](./media/responsible-ai-practices-genai-prompt-skill/fallback-index-pattern.png#lightbox)
-
#### Dropping use of generated field in search queries
A lighter weight alternative to having two copies of the search index, is to ensure that the RAG application can modify the search query issued to Azure AI Search easily. By default when a search query is issued, all searchable fields are scanned, however Azure AI Search allows specifying which fields must be analyzed to produce a set of search results.
@@ -104,6 +111,7 @@ POST https://[service-name].search.windows.net/indexes/[index-name]?api-version=
"queryType": "full"
}
```
+
The RAG application can fall back to this specific query (might require a code change/redeployment), if the default query starts to degrade in performance or evaluation metrics, illustrated by the following diagram.
[  ](./media/responsible-ai-practices-genai-prompt-skill/fallback-query-pattern.png#lightbox)
@@ -128,6 +136,6 @@ Given the scale of data ingestion, it might not be feasible to have a human in t
## Learn more about Azure AI Search
-* [Introduction to Azure AI Search](search-what-is-azure-search.md)
-* [AI enrichment concepts](cognitive-search-concept-intro.md)
-* [Retrieval Augmented Generation (RAG) in Azure AI Search](retrieval-augmented-generation-overview.md)
\ No newline at end of file
+* [Introduction to Azure AI Search](search-what-is-azure-search.md)
+* [AI enrichment concepts](cognitive-search-concept-intro.md)
+* [Retrieval Augmented Generation (RAG) in Azure AI Search](retrieval-augmented-generation-overview.md)
Summary
{
"modification_type": "minor update",
"modification_title": "GenAIプロンプトスキルに関する責任あるAIのベストプラクティスの更新"
}
Explanation
この変更では、articles/search/responsible-ai-best-practices-genai-prompt-skill.md
というファイルに対してマイナーなアップデートが実施されました。主に、GenAIプロンプトスキルを利用した責任あるAIのベストプラクティスに関する内容が整理されており、具体的な人物像(ペルソナ)についての説明が表形式に変換されています。これにより、情報が視覚的にわかりやすくなり、読者が各ペルソナとその課題を一目で理解できるようサポートされています。
この変更では、以下の点が改善されています:
- 各ペルソナ(エンドユーザー、RAGアプリケーション開発者、データ権限者)に対する説明が詳細化され、ユーザーが期待する内容や、彼らが直面する課題が明確に述べられています。
- 透明性の確保やコンテンツの品質管理に関するチャレンジが簡潔に説明され、開発者が直面する可能性のあるリスクが明示されています。
- コンテンツのサンプリングや質の評価の重要性が説明され、開発者が言語モデルで生成されたコンテンツの検証を行うための手段として、デバッグセッションや検索エクスプローラーの使用が提案されています。
全体として、このアップデートは開発者および関係者にとって有用なガイダンスを提供し、責任あるAIの利用を促進することを目的としています。具体的かつ実用的な情報を通じて、AIを取り入れる際のリスク管理や改善策が強調されています。
articles/search/search-get-started-portal-image-search.md
Diff
@@ -465,7 +465,7 @@ This quickstart uses billable Azure resources. If you no longer need the resourc
This quickstart introduced you to the **Import and vectorize data** wizard, which creates all of the necessary objects for multimodal search. To explore each step in detail, see the following tutorials:
-+ [Tutorial: Image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md)
-+ [Tutorial: Image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md)
-+ [Tutorial: Multimodal embeddings and Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md)
-+ [Tutorial: Multimodal embeddings and Document Layout skill](tutorial-document-layout-multimodal-embeddings.md)
++ [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md)
++ [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
++ [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md)
++ [Tutorial: Vectorize from a structured document layout](tutorial-document-layout-multimodal-embeddings.md)
Summary
{
"modification_type": "minor update",
"modification_title": "Azureポータル画像検索のクイックスタートチュートリアルの更新"
}
Explanation
この変更では、articles/search/search-get-started-portal-image-search.md
ファイルに対してマイナーなアップデートが行われています。具体的には、Azureポータルでの画像検索に関するクイックスタートチュートリアルの内容が改訂され、より明確で現代的な表現が採用されています。
主な更新点は、関連するチュートリアルのタイトルが変更され、より具体的な表現になっていることです。具体的には、従来の「画像口頭化とドキュメント抽出スキル」や「マルチモーダル埋め込みとドキュメントレイアウトスキル」といった講座名が、それぞれ「画像を生成的AIで口頭化する」や「構造化されたドキュメントレイアウトからの画像を口頭化する」といった具体的で理解しやすいタイトルに置き換えられています。
この変更により、ユーザーが興味のある内容により迅速にアクセスできるようになり、学習の容易さが向上します。また、チュートリアルを通じて提供される情報が最新の技術や手法に基づくものであることを示唆しています。
全体として、このマイナーアップデートは、Azureポータルでの画像検索機能を利用する際のユーザー体験を向上させ、より良い学習リソースを提供することを目的としています。
articles/search/search-relevance-overview.md
Diff
@@ -13,14 +13,19 @@ ms.date: 07/23/2025
# Relevance in Azure AI Search
-In a query operation, the relevance of any given result is determined by a ranking algorithm that evaluates the strength of a match based on how closely the indexed content and the query align. An algorithm assigns a score, and results are ranked by that score and returned in the response.
+In a query operation, the relevance of any given result is determined by a ranking algorithm that evaluates the strength of a match based on how closely the query corresponds to an indexed document. When a match is found, an algorithm assigns a score, and results are ranked by that score and the topmost results are returned in the response.
Ranking occurs whenever the query request includes full text or vector queries. It doesn't occur if the query invokes strict pattern matching, such as a filter-only query or a specialized query form like autocomplete, suggestions, geospatial search, fuzzy search, or regular expression search. A uniform search score of 1.0 indicates the absence of a ranking algorithm.
-***Relevance tuning*** can be used to boost search scores based on extra criteria such as freshness or proximity. In Azure AI Search, relevance tuning is primarily directed at textual and numeric (nonvector) content when you apply a [scoring profile](#custom-boosting-logic-using-scoring-profiles) or invoke the [semantic ranker](semantic-search-overview.md).
+## Relevance tuning
-> [!NOTE]
-> In Azure AI Search, there's no explicit relevance tuning capabilities for vector content, but you can experiment between Hierarchical Navigable Small World (HNSW) and exhaustive K-nearest neighbors (KNN) to see if one algorithm outperforms the other for your scenario. HNSW graphing with an exhaustive KNN override at query time is the most flexible approach for comparison testing. You can also experiment with various embedding models to see which ones produce higher quality results.
+***Relevance tuning*** is a technique for boosting search scores based on extra criteria such as weighted fields, freshness, or proximity. In Azure AI Search, relevance tuning options vary based on query type:
+
++ For textual and numeric (nonvector) content in keyword or hybrid search, you can tune relevance through [scoring profiles](#custom-boosting-logic-using-scoring-profiles) or invoking the [semantic ranker](semantic-search-overview.md).
+
++ For vector content in a hybrid query, you can [weight a vector field](hybrid-search-ranking.md#weighted-scores) to boost the importance of the vector component relative to the text component of the hybrid query.
+
++ For pure vector queries, you can experiment between Hierarchical Navigable Small World (HNSW) and exhaustive K-nearest neighbors (KNN) to see if one algorithm outperforms the other for your scenario. HNSW graphing with an exhaustive KNN override at query time is the most flexible approach for comparison testing. You can also experiment with various embedding models to see which ones produce higher quality results. Finally, remember that a hybrid query or a vector query on documents that include nonvector fields are in-scope for relevance tuning, so it's just the vector fields themselves that can't participate in a relevance tuning effort.
## Levels of ranking
@@ -42,7 +47,7 @@ Scoring logic applies to text and numeric nonvector content. You can use scoring
+ [Text (keyword) search](search-query-create.md)
+ [Pure vector queries](vector-search-how-to-query.md)
-+ [Hybrid queries](hybrid-search-how-to-query.md), with text and vector subqueries execute in parallel
++ [Hybrid queries](hybrid-search-how-to-query.md), where text and vector subqueries execute in parallel
+ [Semantically ranked queries](semantic-how-to-query-request.md)
For standalone text queries, scoring profiles identify the top 1,000 matches in a [BM25-ranked search](index-similarity-and-scoring.md), with the top 50 matches returned in the response.
Summary
{
"modification_type": "minor update",
"modification_title": "Azure AI Searchにおける関連性の概要の更新"
}
Explanation
この変更では、articles/search/search-relevance-overview.md
ファイルに対してマイナーなアップデートが行われています。主に、Azure AI Searchにおける検索結果の関連性に関する説明が見直され、より具体的かつ詳細な情報が追加されています。
主なアップデートとしては、関連性の評価プロセスに関する記述の改善が含まれています。具体的には、クエリがインデックスされたドキュメントとどのように一致するかに基づいてスコアが割り当てられ、検索結果がそのスコアに基づいてランク付けされることが強調されています。また、従来の「関連性調整」というセクションが新しい形式に整理され、関連性の調整手法や利点が明確に示されています。
特に注目すべきは、テキストや数値データに対するスコアリングプロファイルや、セマンティックランカーの適用、さらにベクターコンテンツに対するウエイト調整のオプションに関する具体的な情報です。これにより、異なる検索タイプに基づいた関連性調整の方法が理解しやすく整理されています。
全体として、この変更は、Azure AI Searchを利用する開発者やユーザーが、検索結果の関連性をより良く理解し、効果的に活用できるようにすることを目的としています。これにより、ユーザーは検索精度を向上させるための知識を得ることができ、より良い検索体験の実現が期待できます。
articles/search/toc.yml
Diff
@@ -138,14 +138,14 @@ items:
href: tutorial-adls-gen2-indexer-acls.md
- name: Multimodal indexing tutorials
items:
- - name: Use document extraction and multimodal embeddings
+ - name: Vectorize images and text
href: tutorial-document-extraction-multimodal-embeddings.md
- - name: Use document extraction and image verbalizations
- href: tutorial-document-extraction-image-verbalization.md
- - name: Use semantic chunking and multimodal embeddings
+ - name: Vectorize from a structured document layout
href: tutorial-document-layout-multimodal-embeddings.md
- - name: Use semantic chunking and image verbalizations
- href: tutorial-document-layout-image-verbalization.md
+ - name: Verbalize images using generative AI
+ href: tutorial-document-extraction-image-verbalization.md
+ - name: Verbalize images from a structured document layout
+ href: tutorial-document-layout-image-verbalization.md
- name: RAG tutorials
items:
- name: Build a RAG solution
@@ -364,22 +364,28 @@ items:
href: cognitive-search-output-field-mapping.md
- name: Process image files
href: cognitive-search-concept-image-scenarios.md
- - name: Configure an enrichment cache
- href: enrichment-cache-how-to-configure.md
- - name: Manage an enrichment cache
- href: enrichment-cache-how-to-manage.md
- - name: Best practices - GenAI Prompt skill
- href: responsible-ai-best-practices-genai-prompt-skill.md
- - name: GenAI Prompt Skill - Example Usage Guide
- href: chat-completion-skill-example-usage.md
+ - name: Enrichment cache
+ items:
+ - name: Configure an enrichment cache
+ href: enrichment-cache-how-to-configure.md
+ - name: Manage an enrichment cache
+ href: enrichment-cache-how-to-manage.md
+ - name: Generative AI skills
+ items:
+ - name: Add AI-generated content (GenAI Prompt skill)
+ href: chat-completion-skill-example-usage.md
+ - name: Best practices using GenAI Prompt skill
+ href: responsible-ai-best-practices-genai-prompt-skill.md
- name: Custom skills
items:
- - name: Integrate custom skills
+ - name: Add custom skills
href: cognitive-search-custom-skill-interface.md
- name: Scale out custom skills
href: cognitive-search-custom-skill-scale.md
- name: Example - Bing Entity Search
href: cognitive-search-create-custom-skill-example.md
+ - name: Azure AI Search Power Skills
+ href: https://github.com/Azure-Samples/azure-search-power-skills
- name: Retrieval
items:
- name: Agentic retrieval
Summary
{
"modification_type": "minor update",
"modification_title": "Azure AI Searchの目次のアップデート"
}
Explanation
この変更では、articles/search/toc.yml
ファイルに対してマイナーなアップデートが行われています。主に、Azure AI Searchに関連するチュートリアルとセクションの名称変更および再構成が含まれています。
まず、チュートリアルのタイトルがいくつか変更され、より具体的な内容が反映されています。例えば、「ドキュメント抽出とマルチモーダル埋め込みを使用する」から「画像とテキストをベクター化する」への変更が見られます。これは新しい技術やアプローチを強調することが目的です。
さらに、旧来のセクションが新しい形式で整理されています。特に「生成的AIスキル」という新しいサブセクションが追加され、AI生成コンテンツを追加する方法やそのベストプラクティスに関する情報が含まれています。また、「エンリッチメントキャッシュ」というトピックも再構成され、関連するサブトピックが階層化されています。
全体として、この更新はAzure AI Searchのドキュメント構造をより整理し、ユーザーにとってのナビゲーションを向上させることを目的としています。これにより、ユーザーは最新のチュートリアルや機能にアクセスしやすくなり、効率的な学習体験が実現されることが期待されます。
articles/search/tutorial-document-extraction-image-verbalization.md
Diff
@@ -1,5 +1,5 @@
---
-title: 'Tutorial: Use Image Verbalization and Document Extraction Skill for Multimodal Indexing'
+title: 'Tutorial: Verbalize images using generative AI'
titleSuffix: Azure AI Search
description: Learn how to extract, index, and search multimodal content using the Document Extraction skill for chunking and GenAI Prompt skill for image verbalizations.
@@ -14,49 +14,42 @@ ms.date: 05/29/2025
---
-# Tutorial: Index mixed content using image verbalizations and the Document Extraction skill
+# Tutorial: Verbalize images using generative AI
-Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline by describing visual content in natural language and embedding it alongside document text.
+Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline that includes steps for describing visual content in natural language and using the generated descriptions in your searchable index.
-From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities: text and verbalized images.
+From the source document, each image is passed to the [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) that calls a chat completion model to generate a concise textual description. These descriptions, along with the original document text, are then embedded into vector representations using Azure OpenAI’s text-embedding-3-large model. The result is a single index containing semantically searchable content from both modalities: text and verbalized images.
In this tutorial, you use:
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
-+ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting normalized images and text.
++ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
-+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions, which are text-based descriptions of visual content, for search and grounding.
++ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting normalized images and text.
-+ A search index configured to store text and image embeddings and support for vector-based similarity search.
++ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) that calls a chat completion model to create descriptions of visual content.
-This tutorial demonstrates a lower-cost approach for indexing multimodal content using Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it doesn't include locational metadata for text, such as page numbers or bounding regions.
++ A search index configured to store text and image verbalizations.
-For a more comprehensive solution that includes structured text layout and spatial metadata, see [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md).
+This tutorial demonstrates a lower-cost approach for indexing multimodal content using the Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it doesn't include locational metadata for text, such as page numbers or bounding regions. For a more comprehensive solution that includes structured text layout and spatial metadata, see [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md).
> [!NOTE]
-> Setting `imageAction` to `generateNormalizedImages` is required for this tutorial and incurs an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
-
-Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
-
-> [!div class="checklist"]
-> + Set up sample data and configure an `azureblob` data source
-> + Create an index with support for text and image embeddings
-> + Define a skillset with extraction, captioning, and embedding steps
-> + Create and run an indexer to process and index content
-> + Search the index you just created
+> Setting `imageAction` to `generateNormalizedImages` results in image extraction, which is an extra charge. For more information, see [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/) for image extraction.
## Prerequisites
-+ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
++ [Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
-+ [Azure Storage](/azure/storage/common/storage-account-create).
++ [Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
-+ [Azure AI Search](search-what-is-azure-search.md), Basic pricing tier or higher, with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription.
++ A chat completion model hosted in Azure AI Foundry or another source. The model is used to verbalize image content. You provide the URI to the hosted model in the GenAI Prompt skill definition.
+
++ A text embedding model deployed in Azure AI Foundry. The model is used to vectorize text content pull from source documents and the image descriptions generated by the chat completion model. For integrated vectorization, the embedding model must be located in Azure AI Foundry, and it must be either text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. If you want to use an external embedding model, use a custom skill instead of the Azure OpenAI embedding skill.
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
-### Download files
+## Prepare data
Download the following sample PDF:
@@ -68,7 +61,7 @@ Download the following sample PDF:
1. [Upload the sample data file](/azure/storage/blobs/storage-quickstart-blobs-portal).
-1. [Create a role assignment in Azure Storage and Specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
+1. [Create a **Storage Blob Data Reader** role assignment and specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
1. For connections made using a system-assigned managed identity. Provide a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
@@ -77,6 +70,7 @@ Download the following sample PDF:
"connectionString" : "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;"
}
```
+
1. For connections made using a user-assigned managed identity. Provide a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. Provide an identity using the syntax shown in the following example. Set userAssignedIdentity to the user-assigned managed identity The connection string is similar to the following example:
```json
@@ -339,7 +333,9 @@ Key points:
## Create a skillset
-[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
+[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the built-in Document Extraction skill to extract text and images. It uses Text Split skill to chunk large text. It uses Azure OpenAI Embedding skill to vectorize text content.
+
+The skillset also performs actions specific to images. It uses the GenAI Prompt skill to generate image descriptions. It also creates a knowledge store that stores intact images so that you can return them in a query.
```http
### Create a skillset
@@ -354,7 +350,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
{
"@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
"name": "document-extraction-skill",
- "description": "Document extraction skill to exract text and images from documents",
+ "description": "Document extraction skill to extract text and images from documents",
"parsingMode": "default",
"dataToExtract": "contentAndMetadata",
"configuration": {
@@ -458,7 +454,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
},
{
"@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
- "name": "verblized-image-embedding-skill",
+ "name": "verbalized-image-embedding-skill",
"description": "Embedding skill for verbalized images",
"context": "/document/normalized_images/*",
"inputs": [
@@ -752,4 +748,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
* [GenAI Prompt skill](cognitive-search-skill-genai-prompt.md)
* [Vectors in Azure AI Search](vector-search-overview.md)
* [Semantic ranking in Azure AI Search](semantic-search-overview.md)
-* [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md)
+* [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
Summary
{
"modification_type": "minor update",
"modification_title": "画像の生成的AIによる記述を使用したチュートリアルの更新"
}
Explanation
この変更では、articles/search/tutorial-document-extraction-image-verbalization.md
ファイルに対してマイナーなアップデートが行われています。主に、チュートリアルのタイトルおよび内容が更新され、生成的AIを使用して画像を記述するプロセスに関する情報が強調されています。
タイトルが「画像の生成的AIによる記述を使用したチュートリアル」に変更されたことで、この更新が焦点を当てている内容が明確になっています。本文の内容も調整され、画像の記述が生成的AIによって行われる様子をより具体的に説明しています。例えば、GenAI Promptスキルがチャット補完モデルを使用して、視覚コンテンツの簡潔なテキスト記述を生成することが明記されました。
また、チュートリアルの中で使用されるスキルやインデクサーの構成についても詳しくなっており、文書抽出スキルやAI強化機能を持つインデクシングパイプラインの説明が追加されています。これにより、ユーザーはマルチモーダルコンテンツをインデックスするための具体的な手順や必要条件を理解しやすくなっています。
全体として、この更新は、システムの機能を説明するだけでなく、ユーザーが実際に作業を行う際の理解を深めるための具体的な情報を提供することを目的としています。これにより、ユーザーは生成的AIを活用した画像の記述方法を効果的に学ぶことができるようになっています。
articles/search/tutorial-document-extraction-multimodal-embeddings.md
Diff
@@ -1,5 +1,5 @@
---
-title: 'Tutorial: Use Multimodal Embeddings and Document Extraction Skill for Multimodal Indexing'
+title: 'Tutorial: Vectorize images and text'
titleSuffix: Azure AI Search
description: Learn how to extract, index, and search multimodal content using the Document Extraction skill for chunking and Azure AI Vision for embeddings.
@@ -13,51 +13,39 @@ ms.topic: tutorial
ms.date: 06/11/2025
---
-
-# Tutorial: Index mixed content using multimodal embeddings and the Document Extraction skill
+<!-- # Tutorial: Index mixed content using multimodal embeddings and the Document Extraction skill -->
+# Tutorial: Vectorize images and text
Azure AI Search can extract and index both text and images from PDF documents stored in Azure Blob Storage. This tutorial shows you how to build a multimodal indexing pipeline by embedding both text and images into a unified semantic search index.
In this tutorial, you use:
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
-+ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting text and normalized images.
++ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
-+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
++ The [Document Extraction skill](cognitive-search-skill-document-extraction.md) for extracting normalized images and text.
-+ A search index configured to store text and image embeddings and support for vector-based similarity search.
++ The [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) to vectorize text and images.
-This tutorial demonstrates a lower-cost approach for indexing multimodal content using Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it doesn't include locational metadata for text, such as page numbers or bounding regions.
++ A search index configured to store text and image embeddings and support for vector-based similarity search.
-For a more comprehensive solution that includes structured text layout and spatial metadata, see [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md).
+This tutorial demonstrates a lower-cost approach for indexing multimodal content using the Document Extraction skill and image captioning. It enables extraction and search over both text and images from documents in Azure Blob Storage. However, it doesn't include locational metadata for text, such as page numbers or bounding regions. For a more comprehensive solution that includes structured text layout and spatial metadata, see [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md).
> [!NOTE]
-> Setting `imageAction` to `generateNormalizedImages` as is required for this tutorial incurs an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
-
-Using a REST client and the [Search REST APIs](/rest/api/searchservice/) you will:
-
-> [!div class="checklist"]
-> + Set up sample data and configure an `azureblob` data source
-> + Create an index with support for text and image embeddings
-> + Define a skillset with extraction and embedding steps
-> + Create and run an indexer to process and index content
-> + Search the index you just created
+> Setting `imageAction` to `generateNormalizedImages` results in image extraction, which is an extra charge. For more information, see [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/) for image extraction.
## Prerequisites
-+ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
-
-+ [Azure Storage](/azure/storage/common/storage-account-create).
++ [Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
-+ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) for image vectorization. Image vectorization requires Azure AI Vision multimodal embeddings. For an updated list of regions, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
++ [Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
-+ [Azure AI Search](search-what-is-azure-search.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription.
- > Your service must be on the Basic tier or higher—this tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
++ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) that provides Azure AI Vision for multimodal embeddings. You must use an Azure AI multi-service account for this task. For an updated list of regions that provide multimodal embeddings, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
-### Download files
+## Prepare data
Download the following sample PDF:
@@ -69,7 +57,7 @@ Download the following sample PDF:
1. [Upload the sample data file](/azure/storage/blobs/storage-quickstart-blobs-portal).
-1. [Create a role assignment in Azure Storage and Specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
+1. [Create a **Storage Blob Data Reader** role assignment and specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
1. For connections made using a system-assigned managed identity. Provide a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
@@ -339,7 +327,7 @@ Key points:
## Create a skillset
-[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
+[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the built-in Document Extraction skill to extract text and images. It uses Text Split skill to chunk large text. It uses Azure AI Vision multimodal embeddings skill to vectorize image and text content.
```http
### Create a skillset
@@ -354,7 +342,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
{
"@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
"name": "document-extraction-skill",
- "description": "Document extraction skill to exract text and images from documents",
+ "description": "Document extraction skill to extract text and images from documents",
"parsingMode": "default",
"dataToExtract": "contentAndMetadata",
"configuration": {
@@ -712,4 +700,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
* [AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md)
* [Vectors in Azure AI Search](vector-search-overview.md)
* [Semantic ranking in Azure AI Search](semantic-search-overview.md)
-* [Indexing blobs with text and images for multimodal RAG scenarios using image verbalization and Document Layout skill](tutorial-document-layout-image-verbalization.md)
+* [Tutorial: Verbalize images from a structured document layout](tutorial-document-layout-image-verbalization.md)
Summary
{
"modification_type": "minor update",
"modification_title": "マルチモーダル埋め込みを使用した文書抽出のチュートリアルの更新"
}
Explanation
この変更では、articles/search/tutorial-document-extraction-multimodal-embeddings.md
ファイルに対してマイナーなアップデートが行われています。主な内容は、チュートリアルのタイトルと説明が変更され、Azure AI Searchを利用したマルチモーダルインデクシングの手法がより明確化されています。
まず、タイトルが「マルチモーダル埋め込みと文書抽出スキルを使用するチュートリアル」から「画像とテキストをベクター化するチュートリアル」に変更され、チュートリアルの核心に迫る内容となっています。これにより、利用者が何を学べるのかが即座に伝わるようになりました。
内容に関してもいくつかの調整がなされており、特に「文書抽出スキル」や「Azure AI Visionマルチモーダル埋め込みスキル」に関する情報が整理されています。旧来の情報が新しい内容に置き換わっており、具体的にはテキストと画像を両方とも含むインデクシングパイプラインの構築方法がより分かりやすく説明されています。
さらに、チュートリアルの目的や利用するデータセット、必要なスキルセットなどが明確化され、ユーザーエクスペリエンスの向上が図られています。また、設定する際の注意点や、設定に関連する役割や権限についても分かりやすく説明されています。
このように、このアップデートはマルチモーダルコンテンツを扱う際の手順を整理し、利用者が効果的に学びやすくなることを目的としています。全体として、Azure AI Searchを用いた文書の抽出とインデックス作成に関する実用的なガイダンスが提供されています。
articles/search/tutorial-document-layout-image-verbalization.md
Diff
@@ -1,5 +1,5 @@
---
-title: 'Tutorial: Use Image Verbalization and Document Layout Skill for Multimodal Indexing'
+title: 'Tutorial: Verbalize images from a structured document layout'
titleSuffix: Azure AI Search
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and GenAI Prompt skill for image verbalizations.
@@ -14,7 +14,7 @@ ms.date: 05/29/2025
---
-# Tutorial: Index mixed content using image verbalizations and the Document Layout skill
+# Tutorial: Verbalize images from a structured document layout
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure and uses image verbalization to describe images. Cropped images are stored in a knowledge store, and visual content is described in natural language and ingested alongside text in a searchable index.
@@ -24,37 +24,31 @@ In this tutorial, you use:
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
-+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
++ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
- The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md).
++ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
-+ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) to generate image captions, which are text-based descriptions of visual content, for search and grounding.
++ The [GenAI Prompt skill (preview)](cognitive-search-skill-genai-prompt.md) that calls a chat completion model to create descriptions of visual content.
-+ A search index configured to store text and image embeddings and support for vector-based similarity search.
++ A search index configured to store extracted text and image verbalizations. Some content is vectorized for vector-based similarity search.
-> [!NOTE]
-> Setting `imageAction` to `generateNormalizedImages` is required for this tutorial and incurs an additional charge for image extraction according to [Azure AI Search pricing](https://azure.microsoft.com/pricing/details/search/).
+## Prerequisites
-Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
++ [Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
-> [!div class="checklist"]
-> + Set up sample data and configure an `azureblob` data source
-> + Create an index with support for text and image embeddings
-> + Define a skillset with extraction, captioning, embedding and knowleage store file projection steps
-> + Create and run an indexer to process and index content
-> + Search the index you just created
++ [Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
-## Prerequisites
++ A chat completion model hosted in Azure AI Foundry or another source. The model is used to verbalize image content. You provide the URI to the hosted model in the GenAI Prompt skill definition.
-+ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
++ A text embedding model deployed in Azure AI Foundry. The model is used to vectorize text content pull from source documents and the image descriptions generated by the chat completion model. For integrated vectorization, the embedding model must be located in Azure AI Foundry, and it must be either text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. If you want to use an external embedding model, use a custom skill instead of the Azure OpenAI embedding skill.
-+ [Azure Storage](/azure/storage/common/storage-account-create).
++ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
-+ [Azure AI Search](search-what-is-azure-search.md). [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier.
+## Limitations
-+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
+The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md).
-### Download files
+## Prepare data
Download the following sample PDF:
@@ -66,7 +60,7 @@ Download the following sample PDF:
1. [Upload the sample data file](/azure/storage/blobs/storage-quickstart-blobs-portal).
-1. [Create a role assignment in Azure Storage and Specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
+1. [Create a **Storage Blob Data Reader** role assignment and specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
1. For connections made using a system-assigned managed identity. Provide a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
@@ -302,7 +296,10 @@ Key points:
## Create a skillset
-[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
+[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the Document Layout skill to extract text and images, preserving location metadata which is useful for citations in RAG applications. It uses Azure OpenAI Embedding skill to vectorize text content.
+
+The skillset also performs actions specific to images. It uses the GenAI Prompt skill to generate image descriptions. It also creates a knowledge store that stores intact images so that you can return them in a query.
+
```http
### Create a skillset
Summary
{
"modification_type": "minor update",
"modification_title": "構造化された文書レイアウトから画像を記述するチュートリアルの更新"
}
Explanation
この変更では、articles/search/tutorial-document-layout-image-verbalization.md
ファイルに対してマイナーなアップデートが行われています。変更の概要として、チュートリアルのタイトル、内容、構成が調整され、ユーザーに対してより明確で具体的なガイダンスが提供されています。
まず、タイトルが「画像の記述と文書レイアウトスキルを使用したマルチモーダルインデクシングのチュートリアル」から「構造化された文書レイアウトから画像を記述するチュートリアル」に変更されたことで、チュートリアルのフォーカスが具体的になっています。内容においても、画像の記述プロセスや文書構造に基づいたデータのチャンク処理に関する説明が明確化されています。
具体的には、Document LayoutスキルやGenAI Promptスキルに関する説明が更新され、これらの技術がどのように画像やテキストの処理に役立つかが詳しく述べられています。また、各スキルの実装に必要な条件や制約についても言及されており、特にDocument Layoutスキルの地域限定の可用性や、利用制限についての情報が強調されています。
加えて、前処理やインデクシングプロセスにおいて必要とされる他のリソースや準備が明確に記載され、特に知識ストアやAzure AI Foundryに関する情報が利用者に提供されています。さらに、ビジュアルコンテンツが自然言語で説明されるプロセスも改善され、これによりより良いユーザーエクスペリエンスが実現されています。
全体として、この更新は、生成的AIを用いた画像の記述と文書インデクシングの方法を効果的に学ぶサポートを提供することを目的としています。また、今後の利用に際しての明確なガイドラインを提示しています。
articles/search/tutorial-document-layout-multimodal-embeddings.md
Diff
@@ -1,5 +1,5 @@
---
-title: 'Tutorial: Use Multimodal Embeddings and Document Layout Skill for Multimodal Indexing'
+title: 'Tutorial: Vectorize from a structured document layout'
titleSuffix: Azure AI Search
description: Learn how to extract, index, and search multimodal content using the Document Layout skill for chunking and Azure AI Vision for embeddings.
@@ -14,7 +14,7 @@ ms.date: 06/11/2025
---
-# Tutorial: Index mixed content using multimodal embeddings and the Document Layout skill
+# Tutorial: Vectorize from a structured document layout
<!-- Multimodal plays an essential role in generative AI apps and the user experience as it enables the extraction of information not only from text but also from complex images embedded within documents. -->
In this Azure AI Search tutorial, learn how to build a multimodal indexing pipeline that chunks data based on document structure, and uses a multimodal embedding model to vectorize text and images in a searchable index.
@@ -23,49 +23,41 @@ In this tutorial, you use:
+ A 36-page PDF document that combines rich visual content, such as charts, infographics, and scanned pages, with traditional text.
-+ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its locationMetadata from various documents, such as page numbers or bounding regions.
++ An indexer and skillset to create an indexing pipeline that includes AI enrichment through skills.
- The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Index multimodal content using image verbalization and Document Extraction skill](tutorial-document-extraction-image-verbalization.md).
++ The [Document Layout skill (preview)](cognitive-search-skill-document-intelligence-layout.md) for extracting text and normalized images with its `locationMetadata` from various documents, such as page numbers or bounding regions.
-+ Vectorization using the [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md), which generates embeddings for both text and images.
++ The [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) to vectorize text and images.
-+ A search index configured to store text and image embeddings and support for vector-based similarity search.
-
-Using a REST client and the [Search REST APIs](/rest/api/searchservice/), you will:
-
-> [!div class="checklist"]
-> + Set up sample data and configure an `azureblob` data source
-> + Create an index with support for text and image embeddings
-> + Define a skillset with extraction, embedding and knowleage store file projection steps
-> + Create and run an indexer to process and index content
-> + Search the index you just created
++ A search index configured to store extracted text and image verbalizations. Some content is vectorized for vector-based similarity search.
## Prerequisites
-+ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
-
-+ [Azure Storage](/azure/storage/common/storage-account-create).
++ [Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
-+ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) for image vectorization. Image vectorization requires Azure AI Vision multimodal embeddings. For an updated list of regions, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
++ [Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
-+ [Azure AI Search](search-what-is-azure-search.md), with a managed identity. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) in your current subscription. Your service must be on the Basic tier or higher—this tutorial isn't supported on the Free tier. It must also be in the same region as your multi-service account.
++ An [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) that provides Azure AI Vision for multimodal embeddings. You must use an Azure AI multi-service account for this task. For an updated list of regions that provide multimodal embeddings, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
-### Download files
+## Limitations
+
+The [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) has limited regional availability, is bound to Azure AI services, and requires a [billable resource](cognitive-search-attach-cognitive-services.md) for transactions that exceed 20 documents per indexer per day. For a lower-cost solution to indexing multimodal content, see [Tutorial: Verbalize images using generative AI](tutorial-document-extraction-image-verbalization.md).
+
+## Prepare data
Download the following sample PDF:
+ [sustainable-ai-pdf](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/msc/documents/presentations/CSR/Accelerating-Sustainability-with-AI-2025.pdf)
-
### Upload sample data to Azure Storage
1. In Azure Storage, create a new container named **doc-intelligence-multimodality-container**.
1. [Upload the sample data file](/azure/storage/blobs/storage-quickstart-blobs-portal).
-1. [Create a role assignment in Azure Storage and specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
+1. [Create a **Storage Blob Data Reader** role assignment and specify a managed identity in a connection string](search-howto-managed-identities-storage.md)
1. For connections made using a system-assigned managed identity, provide a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
@@ -299,7 +291,7 @@ Key points:
## Create a skillset
-[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a search index on your search service. An index specifies all the parameters and their attributes.
+[Create Skillset (REST)](/rest/api/searchservice/skillsets/create) creates a skillset on your search service. A skillset defines the operations that chunk and embed content prior to indexing. This skillset uses the Document Layout skill to extract text and images, preserving location metadata which is useful for citations in RAG applications. It uses Azure AI Vision multimodal embeddings skill to vectorize image and text content.
```http
### Create a skillset
@@ -614,4 +606,4 @@ Now that you're familiar with a sample implementation of a multimodal indexing s
+ [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md)
+ [Vectors in Azure AI Search](vector-search-overview.md)
+ [Semantic ranking in Azure AI Search](semantic-search-overview.md)
-+ [Index multimodal content using embeddings and Document Extraction skill](tutorial-document-extraction-multimodal-embeddings.md)
++ [Tutorial: Vectorize images and text](tutorial-document-extraction-multimodal-embeddings.md)
Summary
{
"modification_type": "minor update",
"modification_title": "構造化された文書レイアウトからのベクタライゼーションに関するチュートリアルの更新"
}
Explanation
この変更では、articles/search/tutorial-document-layout-multimodal-embeddings.md
ファイルにマイナーなアップデートが行われており、チュートリアルのタイトルと内容が調整されています。これにより、ユーザーに対して構造化された文書レイアウトからのベクタライゼーションに関する理解が深まるような構成となっています。
チュートリアルのタイトルは「マルチモーダル埋め込みと文書レイアウトスキルを使用するチュートリアル」から「構造化された文書レイアウトからのベクタライゼーションに関するチュートリアル」に変更され、チュートリアルの主旨が明確化されました。この変更により、チュートリアルの内容が一層具体的に理解できるようになっています。
内容に関しても、多くの部分が再構成され、特にドキュメントのレイアウトスキルやAzure AI Visionを用いたベクタライゼーションについての記述が整理されています。また、新しいインデクシングパイプラインを構築する方法や、必要なリソース、認証の構成についても詳細が更新され、より実用的なガイダンスが提供されています。
各ステップにおいて、具体的な操作手順や留意点が明記されており、特にデータの準備や知識ストアの作成に関する情報が詳細に記載されています。また、利用するための前提条件が明確化され、Azure AI Searchサービスの管理や設定についての説明も充実しています。
全体として、このアップデートは、ユーザーがマルチモーダルインデクシングを行う際に必要な知識を得るための有用なリソースを提供することを目的としており、生成AIを利用した情報の抽出と処理に焦点を当てています。